The following equation represents the expected out-of-sample error in terms of which is the 'average function' which can be interpreted as: Generate many data sets
and apply the learning algorithm to each data set to produce final hypotheses
. We can then estimate the average function for any
by
. Essentialy, we are viewing
as a random variable, with the randomness coming from the randomness in the dataset;
is the expected value of this random variable (for a particular
), and
is a function, the average function, composed of these expected values.
The term measures how much the average function that we would learn using different data sets
deviates from the target function that generated these data sets. This term is called bias.
As it measures how much our learning model is biased away from the target function. This is because has the benefit of learning from an unlimited number of datasets, so it is only limited by its ability to approximate
by the limitation in the model learning itself.
The term is the variance of the random variable
.
The variance measures the variation in the final htpothesis, depending on the data set. We thus arrive at the bias-variance decomposition of out-of-sample error.
Considering the target function and a datset of size
. We sample
uniformly in [-1, 1] to generate a data set
,
.
Fit the model using:
: Set of all lines of the form
For , we choose the constant hypothesis that best fits the data (the horizontal line at the midpoint,
).
Consider a target function and a data set of size
. We sample
uniformly in [-1, 1] to generate a data set
,
.
Fit the model using:
: Set of all lines of the form
With , the learned hypothesis is wilder and varies extensively depending on the dataset.
- Abu-Mostafa, Y. S., Magdon-Ismail, M., & Lin, H. T. (2012). Learning from data (Vol. 4). New York, NY, USA:: AMLBook.