Page 32 -
P. 32

15 Evaluating multiple ideas in parallel during

             error analysis




             Your team has several ideas for improving the cat detector:


             • Fix the problem of your algorithm recognizing ​dogs​ as cats.

             • Fix the problem of your algorithm recognizing ​great cats​ (lions, panthers, etc.) as house
               cats (pets).

             • Improve the system’s performance on ​blurry​ images.


             • …

             You can efficiently evaluate all of these ideas in parallel. I usually create a spreadsheet and
             fill it out while looking through ~100 misclassified dev set images. I also jot down comments
             that might help me remember specific examples. To illustrate this process, let’s look at a
             spreadsheet you might produce with a small dev set of four examples:


              Image                     Dog             Great cat           Blurry             Comments

                               1         ✔                                               Unusual pitbull color
                               2                                              ✔

                               3                            ✔                 ✔            Lion; picture taken at
                                                                                             zoo on rainy day
                               4                            ✔                            Panther behind tree

              % of total                25%               50%                50%



             Image #3 above has both the Great Cat and the Blurry columns checked. Furthermore,
             because it is possible for one example to be associated with multiple categories, the
             percentages at the bottom may not add up to 100%.

             Although you may first formulate the categories (Dog, Great cat, Blurry) then categorize the

             examples by hand, in practice, once you start looking through examples, you will probably be
             inspired to propose new error categories. For example, say you go through a dozen images
             and realize a lot of mistakes occur with Instagram-filtered pictures. You can go back and add
             a new “Instagram” column to the spreadsheet. Manually looking at examples that the
             algorithm misclassified and asking how/whether you as a human could have labeled the



             Page 32                            Machine Learning Yearning-Draft                       Andrew Ng
   27   28   29   30   31   32   33   34   35   36   37