Page 29 -
P. 29
13 Build your first system quickly, then iterate
You want to build a new email anti-spam system. Your team has several ideas:
• Collect a huge training set of spam email. For example, set up a “honeypot”: deliberately
send fake email addresses to known spammers, so that you can automatically harvest the
spam messages they send to those addresses.
• Develop features for understanding the text content of the email.
• Develop features for understanding the email envelope/header features to show what set
of internet servers the message went through.
• and more.
Even though I have worked extensively on anti-spam, I would still have a hard time picking
one of these directions. It is even harder if you are not an expert in the application area.
So don’t start off trying to design and build the perfect system. Instead, build and train a
5
basic system quickly—perhaps in just a few days. Even if the basic system is far from the
“best” system you can build, it is valuable to examine how the basic system functions: you
will quickly find clues that show you the most promising directions in which to invest your
time. These next few chapters will show you how to read these clues.
5 This advice is meant for readers wanting to build AI applications, rather than those whose goal is to
publish academic papers. I will later return to the topic of doing research.
Page 29 Machine Learning Yearning-Draft Andrew Ng