Page 86 -
P. 86
Let S be the output transcription (“I love robots”). Let S* be the correct transcription (“I
out
love machine learning”). In order to understand whether #1 or #2 above is the problem, you
can perform the Optimization Verification test: First, compute Score (S*) and
A
Score (S ). Then check whether Score (S*) > Score (S ). There are two possibilities:
A out A A out
Case 1: Score (S*) > Score (S )
A A out
In this case, your learning algorithm has correctly given S* a higher score than S .
out
Nevertheless, our approximate search algorithm chose S rather than S*. This tells you that
out
your approximate search algorithm is failing to choose the value of S that maximizes
Score (S). In this case, the Optimization Verification test tells you that you have a search
A
algorithm problem and should focus on that. For example, you could try increasing the beam
width of beam search.
Case 2: Score (S*) ≤ Score (S )
A A out
In this case, you know that the way you’re computing Score (.) is at fault: It is failing to give a
A
strictly higher score to the correct output S* than the incorrect S . The Optimization
out
Verification test tells you that you have an objective (scoring) function problem. Thus, you
should focus on improving how you learn or approximate Score (S) for different sentences S.
A
Our discussion has focused on a single example. To apply the Optimization Verification test
in practice, you should examine the errors in your dev set. For each error, you would test
whether Score (S*) > Score (S ). Each dev example for which this inequality holds will get
A A out
marked as an error caused by the optimization algorithm. Each example for which this does
not hold (Score (S*) ≤ Score (S )) gets counted as a mistake due to the way you’re
A A out
computing Score (.).
A
For example, suppose you find that 95% of the errors were due to the scoring function
Score (.), and only 5% due to the optimization algorithm. Now you know that no matter how
A
much you improve your optimization procedure, you would realistically eliminate only ~5%
of our errors. Thus, you should instead focus on improving how you estimate Score (.).
A
Page 86 Machine Learning Yearning-Draft Andrew Ng