Page 96 -
P. 96
This system lacks the hand-engineered knowledge. Thus, when the training set is small, it
might do worse than the hand-engineered pipeline.
However, when the training set is large, then it is not hampered by the limitations of an
MFCC or phoneme-based representation. If the learning algorithm is a large-enough neural
network and if it is trained with enough training data, it has the potential to do very well, and
perhaps even approach the optimal error rate.
End-to-end learning systems tend to do well when there is a lot of labeled data for “both
ends”—the input end and the output end. In this example, we require a large dataset of
(audio, transcript) pairs. When this type of data is not available, approach end-to-end
learning with great caution.
If you are working on a machine learning problem where the training set is very small, most
of your algorithm’s knowledge will have to come from your human insight. I.e., from your
“hand engineering” components.
If you choose not to use an end-to-end system, you will have to decide what are the steps in
your pipeline, and how they should plug together. In the next few chapters, we’ll give some
suggestions for designing such pipelines.
Page 96 Machine Learning Yearning-Draft Andrew Ng