Page 66 -

P. 66

33 Why we compare to human-level

performance

Many machine learning systems aim to automate things that humans do well. Examples
include image recognition, speech recognition, and email spam classification. Learning
algorithms have also improved so much that we are now surpassing human-level
performance on more and more of these tasks.

Further, there are several reasons building an ML system is easier if you are trying to do a
task that people can do well:

1. Ease of obtaining data from human labelers. For example, since people recognize
cat images well, it is straightforward for people to provide high accuracy labels for your
learning algorithm.

2. Error analysis can draw on human intuition. Suppose a speech recognition
algorithm is doing worse than human-level recognition. Say it incorrectly transcribes an
audio clip as “This recipe calls for a pear of apples,” mistaking “pair” for “pear.” You can
draw on human intuition and try to understand what information a person uses to get the
correct transcription, and use this knowledge to modify the learning algorithm.

3. Use human-level performance to estimate the optimal error rate and also set
a “desired error rate.” Suppose your algorithm achieves 10% error on a task, but a person
achieves 2% error. Then we know that the optimal error rate is 2% or lower and the
avoidable bias is at least 8%. Thus, you should try bias-reducing techniques.

Even though item #3 might not sound important, I find that having a reasonable and

achievable target error rate helps accelerate a team’s progress. Knowing your algorithm has
high avoidable bias is incredibly valuable and opens up a menu of options to try.

There are some tasks that even humans aren’t good at. For example, picking a book to
recommend to you; or picking an ad to show a user on a website; or predicting the stock
market. Computers already surpass the performance of most people on these tasks. With
these applications, we run into the following problems:

• It is harder to obtain labels. For example, it’s hard for human labelers to annotate a
database of users with the “optimal” book recommendation. If you operate a website or
app that sells books, you can obtain data by showing books to users and seeing what they
buy. If you do not operate such a site, you need to find more creative ways to get data.

Page 66 Machine Learning Yearning-Draft Andrew Ng

61 62 63 64 65 66 67 68 69 70 71