Page 47 -
P. 47

• Avoidable bias​: 1%. This is calculated as the difference between the training error and
                                       8
               the optimal error rate.

             • Variance​: 15%. The difference between the dev error and the training error.

                                                                                                        9
             To relate this to our earlier definitions, Bias and Avoidable Bias are related as follows:

                    Bias = Optimal error rate (“unavoidable bias”) + Avoidable bias

             The “avoidable bias” reflects how much worse your algorithm performs on the training set
             than the “optimal classifier.”

             The concept of variance remains the same as before. In theory, we can always reduce
             variance to nearly zero by training on a massive training set. Thus, all variance is “avoidable”
             with a sufficiently large dataset, so there is no such thing as “unavoidable variance.”


             Consider one more example, where the optimal error rate is 14%, and we have:

             • Training error = 15%

             • Dev error = 16%


             Whereas in the previous chapter we called this a high bias classifier, now we would say that
             error from avoidable bias is 1%, and the error from variance is about 1%. Thus, the algorithm
             is already doing well, with little room for improvement. It is only 2% worse than the optimal
             error rate.

             We see from these examples that knowing the optimal error rate is helpful for guiding our

             next steps. In statistics, the optimal error rate is also called ​Bayes error rate​, or Bayes
             rate.

             How do we know what the optimal error rate is? For tasks that humans are reasonably good
             at, such as recognizing pictures or transcribing audio clips, you can ask a human to provide
             labels then measure the accuracy of the human labels relative to your training set. This

             would give an estimate of the optimal error rate. If you are working on a problem that even


             8  If this number is negative, you are doing better on the training set than the optimal error rate. This
             means you are overfitting on the training set, and the algorithm has over-memorized the training set.
             You should focus on variance reduction methods rather than on further bias reduction methods.

             9  These definitions are chosen to convey insight on how to improve your learning algorithm. These
             definitions are different than how statisticians define Bias and Variance. Technically, what I define
             here as “Bias” should be called “Error we attribute to bias”; and “Avoidable bias” should be “error we
             attribute to the learning algorithm’s bias that is over the optimal error rate.”


             Page 47                            Machine Learning Yearning-Draft                       Andrew Ng
   42   43   44   45   46   47   48   49   50   51   52