Page 53 -

P. 53

27 Techniques for reducing variance

If your learning algorithm suffers from high variance, you might try the following
techniques:

• Add more training data: This is the simplest and most reliable way to address variance,
so long as you have access to significantly more data and enough computational power to

process the data.

• Add regularization (L2 regularization, L1 regularization, dropout): This technique
reduces variance but increases bias.

• Add early stopping (i.e., stop gradient descent early, based on dev set error): This

technique reduces variance but increases bias. Early stopping behaves a lot like
regularization methods, and some authors call it a regularization technique.

• Feature selection to decrease number/type of input features: This technique
might help with variance problems, but it might also increase bias. Reducing the number
of features slightly (say going from 1,000 features to 900) is unlikely to have a huge effect
on bias. Reducing it significantly (say going from 1,000 features to 100—a 10x reduction)

is more likely to have a significant effect, so long as you are not excluding too many useful
features. In modern deep learning, when data is plentiful, there has been a shift away from
feature selection, and we are now more likely to give all the features we have to the
algorithm and let the algorithm sort out which ones to use based on the data. But when
your training set is small, feature selection can be very useful.

• Decrease the model size (such as number of neurons/layers): Use with caution. This

technique could decrease variance, while possibly increasing bias. However, I don’t
recommend this technique for addressing variance. Adding regularization usually gives
better classification performance. The advantage of reducing the model size is reducing
your computational cost and thus speeding up how quickly you can train models. If
speeding up model training is useful, then by all means consider decreasing the model size.
But if your goal is to reduce variance, and you are not concerned about the computational
cost, consider adding regularization instead.

Here are two additional tactics, repeated from the previous chapter on addressing bias:

• Modify input features based on insights from error analysis: Say your error
analysis inspires you to create additional features that help the algorithm to eliminate a
particular category of errors. These new features could help with both bias and variance. In

Page 53 Machine Learning Yearning-Draft Andrew Ng

48 49 50 51 52 53 54 55 56 57 58