Double Descent in Different Polynomial Bases

Understanding Double Descent

Classical results in statistics suggest that increasing the complexity of the model used to fit data leads to overfitting, such that while the data is fit better, the model does not generalize (see ). However, modern results from machine learning show that if we keep increasing the complexity beyond the interpolation threshold (where the model fits training data perfectly), the model has an infinite number of perfect fits to choose from and implicitly prefers the "smoothest" fit, thus leading to a second descent in the training error.

The Importance of Choosing the Right Features

However, such a descent only occurs when the model fits the right features. For example, for polynomial regression, the model features can correspond to any polynomial. However, double descent is seen only when the features are orthogonal with respect to the data distribution, thus reducing correlations.

Explore Fitting Polynomials

Below, you can explore how different sets of polynomials fit the data for different degrees of the polynomial. The data itself is drawn from different intervals and different intervals. See how the test data behaves when the polynomial basis is orthogonal in the distribution, vs when it is not (you may need to regenerate the data a few times).

Distributions for Orthogonality:
• Legendre on Uniform [-1,1] • Hermite on Gaussian (-∞, ∞) • Laguerre on Exponential [, ∞) • Chebyshev on Wigner [-1,1]

Model Config

Polynomial Basis

True Degree of Polynomial 4

Max Degree to be Fit 25

Data Generation

X Distribution

Min X

Max X

Number of Data Points to Fit 12

Noise 0.5

Test vs. Train Loss (Log Scale)

Model Fit Visualization

View Degree: