This ideal is called the regression function.
is the function that minimizes over all functions at all points .
is the irreducible error.
For any estimate of , we have
Relax the definition and let
where is some neighborhood of .
Nearest neighbor methods can be lousy when is large.
Reason: the curse of dimensionality. Nearest neighbors tend to be far away in high dimensions.
The linear model is an important example of a parametric model:
Suppose we fit a model to some training data , and we wish to see how well it performs.
We could compute the average squared prediction error over :
This may be biased toward more overfit models
Instead we should, if possible, compute it using fresh test data
Suppose we have fit a model to some training data , anc let be a test observation drawn from the population. If the true model is (with ), then
The expectation averages over the variability of as well as the variability in . Note that .
Here the response variable is qualitative.
Suppose the elements in are numbered Let
These are the conditional class probabilities at . Then the Bayes optimal classifier at is
Nearest-neighbor averaging can be used as before.
Also breaks down as dimension grows. However, the impact on is less than on .
Typically we measure the performance of using the misclassification error rate:
The Bayes classifier (using the true ) has smallest error (in the population).
K-nearest neighbors (KNN) classifier: Given a positive integer and a test observation , KNN first identifies a set of points in the training data that are closest to , denoted by . It then estimates the conditional probability for class by
Finally, assign to class with the largest probability.
— Jul 15, 2022
Made with ❤ at Earth.