Let the parts be , where denotes the indices of the observations in part . There are observations in part : if is a multiple of , then
Compute
where , and is the fit for observation , obtained from the data with part removed.
Setting yields -fold or leave-one out cross-validation (LOOCV).
With least-squares linear or polynomial regression, an amazing shortcut makes the cost of LOOCV the same as that of a single model fit! The following formula holds:
where is the th fitted value from the original least squares fit, and is the leverage (diagonal of the “hat” matrix; see book for details.) This is like the ordinary MSE, except the th residual is divided by .
LOOCV sometimes useful, but typically doesn’t shake up the data enough. The estimates from each fold are highly correlated and hence their average can have high variance.
a better choice is or .
We divide the data into roughly equal-sized parts . denotes the indices of the observations in part . There are observations in part : if is a multiple of , then .
Compute
where .
The estimated standard deviation of is
Suppose that we wish to invest a fixed sum of money in two financial assets that yield returns of and , respectively, where and are random quantities.
We will invest a fraction of our money in , and will invest the remaining in .
We wish to choose to minimize the total risk, or variance, of our investment. In other words, we want to minimize .
One can show that the value that minimizes the risk is given by
where , and
But the values of , and are unknown.
We can compute estimates for these quantities, , and , using a data set that contains measurements for and .
We can then estimate the value of that minimizes the variance of our investment using
To estimate the standard deviation of , we repeated the process of simulating 100 paired observations of and , and estimating 1,000 times.
We thereby obtained 1,000 estimates for , which we can call
For these simulations the parameters were set to , and , and so we know that the true value of is .
The mean over all 1,000 estimates for is
very close to , and the standard deviation of the estimates is
This gives us a very good idea of the accuracy of :
So roughly speaking, for a random sample from the population, we would expect to differ from by approximately , on average.
The procedure outlined above cannot be applied, because for real data we cannot generate new samples from the original population.
However, the bootstrap approach allows us to use a computer to mimic the process of obtaining new data sets, so that we can estimate the variability of our estimate without generating additional samples.
• Rather than repeatedly obtaining independent data sets from the population, we instead obtain distinct data sets by repeatedly sampling observations from the original data set with replacement.
Each of these “bootstrap data sets” is created by sampling with replacement, and is the same size as our original dataset. As a result some observations may appear more than once in a given bootstrap data set and some not at all.
Denoting the first bootstrap data set by , we use to produce a new bootstrap estimate for , which we call .
This procedure is repeated times for some large value of (say 100 or 1000 ), in order to produce different bootstrap data sets, , and corresponding estimates, .
We estimate the standard error of these bootstrap estimates using the formula
This serves as an estimate of the standard error of estimated from the original data set. For this example .
— Jul 15, 2022
Made with ❤ at Earth.