Page 21 -
P. 21

score,” which is a modified way of computing their average, and works better than simply
             taking the mean.  4

              Classifier   Precision    Recall       F1 score

              A                   95%          90%        92.4%

              B                   98%          85%        91.0%


             Having a single-number evaluation metric speeds up your ability to make a decision when
             you are selecting among a large number of classifiers. It gives a clear preference ranking
             among all of them, and therefore a clear direction for progress.

             As a final example, suppose you are separately tracking the accuracy of your cat classifier in

             four key markets: (i) US, (ii) China, (iii) India, and (iv) Other. This gives four metrics. By
             taking an average or weighted average of these four numbers, you end up with a single
             number metric. Taking an average or weighted average is one of the most common ways to
             combine multiple metrics into one.







































             4  If you want to learn more about the F1 score, see ​https://en.wikipedia.org/wiki/F1_score​. It is the
             “harmonic mean” between Precision and Recall, and is calculated as 2/((1/Precision)+(1/Recall)).


             Page 21                            Machine Learning Yearning-Draft                       Andrew Ng
   16   17   18   19   20   21   22   23   24   25   26