Page 250 - Computational Statistics Handbook with MATLAB
P. 250
238 Computational Statistics Handbook with MATLAB
1994]. We outline the steps for cross-validation below and demonstrate this
approach in Example 7.3.
PROCEDURE - CROSS-VALIDATION
1. Partition the data set into K partitions. For simplicity, we assume
⋅
that n = r K , so there are r observations in each set.
2. Leave out one of the partitions for testing purposes.
3. Use the remaining n – r data points for training (e.g., fit the model,
build the classifier, estimate the probability density function).
4. Use the test set with the model and determine the squared error
between the observed and predicted response: y i –( y i ) 2 .
ˆ
5. Repeat steps 2 through 4 until all K partitions have been used as a
test set.
6. Determine the average of the n errors.
Note that the error mentioned in step 4 depends on the application and the
goal of the analysis [Hjorth, 1994]. For example, in pattern recognition appli-
cations, this might be the cost of misclassifying a case. In the following exam-
ple, we apply the cross-validation technique to help decide what type of
model should be used for the steam data.
Example 7.3
In this example, we apply cross-validation to the modeling problem of Exam-
ple 7.1. We fit linear, quadratic (degree 2) and cubic (degree 3) models to the
data and compare their accuracy using the estimates of prediction error
obtained from cross-validation.
% Set up the array to store the prediction errors.
n = length(x);
r1 = zeros(1,n);% store error - linear fit
r2 = zeros(1,n);% store error - quadratic fit
r3 = zeros(1,n);% store error - cubic fit
% Loop through all of the data. Remove one point at a
% time as the test point.
for i = 1:n
xtest = x(i);% Get the test point.
ytest = y(i);
xtrain = x;% Get the points to build model.
ytrain = y;
xtrain(i) = [];% Remove test point.
ytrain(i) = [];
% Fit a first degree polynomial to the data.
[p1,s] = polyfit(xtrain,ytrain,1);
© 2002 by Chapman & Hall/CRC