Classical results in statistics suggest that increasing the complexity of the model used to fit data leads to overfitting, such that while the data is fit better, the model does not generalize (see ). However, modern results from machine learning show that if we keep increasing the complexity beyond the interpolation threshold (where the model fits training data perfectly), the model has an infinite number of perfect fits to choose from and implicitly prefers the "smoothest" fit, thus leading to a second descent in the training error.
However, such a descent only occurs when the model fits the right features. For example, for polynomial regression, the model features can correspond to any polynomial. However, double descent is seen only when the features are orthogonal with respect to the data distribution, thus reducing correlations.
Below, you can explore how different sets of polynomials fit the data for different degrees
of the polynomial. The data itself is drawn from different intervals and different
intervals. See
how the test data behaves when the polynomial basis is orthogonal in the distribution, vs
when
it is not (you may need to regenerate the data a few times).
Distributions for Orthogonality:
• Legendre on Uniform [-1,1] • Hermite on Gaussian (-∞, ∞) • Laguerre on Exponential [, ∞) •
Chebyshev on Wigner [-1,1]