Page 53 -
P. 53

27 Techniques for reducing variance




             If your learning algorithm suffers from high variance, you might try the following
             techniques:

             • Add more training data​: This is the simplest and most reliable way to address variance,
               so long as you have access to significantly more data and enough computational power to

               process the data.

             • Add regularization​ (L2 regularization, L1 regularization, dropout): This technique
               reduces variance but increases bias.

             • Add early stopping​ (i.e., stop gradient descent early, based on dev set error): This

               technique reduces variance but increases bias. Early stopping behaves a lot like
               regularization methods, and some authors call it a regularization technique.

             • Feature selection to decrease number/type of input features:​ This technique
               might help with variance problems, but it might also increase bias. Reducing the number
               of features slightly (say going from 1,000 features to 900) is unlikely to have a huge effect
               on bias. Reducing it significantly (say going from 1,000 features to 100—a 10x reduction)

               is more likely to have a significant effect, so long as you are not excluding too many useful
               features. In modern deep learning, when data is plentiful, there has been a shift away from
               feature selection, and we are now more likely to give all the features we have to the
               algorithm and let the algorithm sort out which ones to use based on the data. But when
               your training set is small, feature selection can be very useful.

             • Decrease the model size ​(such as number of neurons/layers): ​Use with caution.​ This

               technique could decrease variance, while possibly increasing bias. However, I don’t
               recommend this technique for addressing variance. Adding regularization usually gives
               better classification performance. The advantage of reducing the model size is reducing
               your computational cost and thus speeding up how quickly you can train models. If
               speeding up model training is useful, then by all means consider decreasing the model size.
               But if your goal is to reduce variance, and you are not concerned about the computational
               cost, consider adding regularization instead.


             Here are two additional tactics, repeated from the previous chapter on addressing bias:

             • Modify input features based on insights from error analysis​: Say your error
               analysis inspires you to create additional features that help the algorithm to eliminate a
               particular category of errors. These new features could help with both bias and variance. In


             Page 53                            Machine Learning Yearning-Draft                       Andrew Ng
   48   49   50   51   52   53   54   55   56   57   58