Page 22 -
P. 22

9 Optimizing and satisficing metrics




             Here’s another way to combine multiple evaluation metrics.

             Suppose you care about both the accuracy and the running time of a learning algorithm. You
             need to choose from these three classifiers:


              Classifier                         Accuracy           Running time
              A                                      90%                    80ms

              B                                      92%                    95ms

              C                                      95%                 1,500ms


             It seems unnatural to derive a single metric by putting accuracy and running time into a
             single formula, such as:

                    Accuracy - 0.5*RunningTime


             Here’s what you can do instead: First, define what is an “acceptable” running time. Lets say
             anything that runs in 100ms is acceptable. Then, maximize accuracy, subject to your
             classifier meeting the running time criteria. Here, running time is a “satisficing
             metric”—your classifier just has to be “good enough” on this metric, in the sense that it
             should take at most 100ms. Accuracy is the “optimizing metric.”


             If you are trading off N different criteria, such as binary file size of the model (which is
             important for mobile apps, since users don’t want to download large apps), running time,
             and accuracy, you might consider setting N-1 of the criteria as “satisficing” metrics. I.e., you
             simply require that they meet a certain value. Then define the final one as the “optimizing”
             metric. For example, set a threshold for what is acceptable for binary file size and running
             time, and try to optimize accuracy given those constraints.


             As a final example, suppose you are building a hardware device that uses a microphone to
             listen for the user saying a particular “wakeword,” that then causes the system to wake up.
             Examples include Amazon Echo listening for “Alexa”; Apple Siri listening for “Hey Siri”;
             Android listening for “Okay Google”; and Baidu apps listening for “Hello Baidu.” You care
             about both the false positive rate—the frequency with which the system wakes up even when
             no one said the wakeword—as well as the false negative rate—how often it fails to wake up
             when someone says the wakeword. One reasonable goal for the performance of this system is





             Page 22                            Machine Learning Yearning-Draft                       Andrew Ng
   17   18   19   20   21   22   23   24   25   26   27