Page 82 -

P. 82

43 Artificial data synthesis

Your speech system needs more data that sounds as if it were taken from within a car. Rather
than collecting a lot of data while driving around, there might be an easier way to get this
data: By artificially synthesizing it.

Suppose you obtain a large quantity of car/road noise audio clips. You can download this

data from several websites. Suppose you also have a large training set of people speaking in a
quiet room. If you take an audio clip of a person speaking and “add” to that to an audio clip
of car/road noise, you will obtain an audio clip that sounds as if that person was speaking in
a noisy car. Using this process, you can “synthesize” huge amounts of data that sound as if it
were collected inside a car.

More generally, there are several circumstances where artificial data synthesis allows you to

create a huge dataset that reasonably matches the dev set. Let’s use the cat image detector as
a second example. You notice that dev set images have much more motion blur because they
tend to come from cellphone users who are moving their phone slightly while taking the
picture. You can take non-blurry images from the training set of internet images, and add
simulated motion blur to them, thus making them more similar to the dev set.

Keep in mind that artificial data synthesis has its challenges: it is sometimes easier to create

synthetic data that appears realistic to a person than it is to create data that appears realistic
to a computer. For example, suppose you have 1,000 hours of speech training data, but only
1 hour of car noise. If you repeatedly use the same 1 hour of car noise with different portions
from the original 1,000 hours of training data, you will end up with a synthetic dataset where
the same car noise is repeated over and over. While a person listening to this audio probably
would not be able to tell—all car noise sounds the same to most of us—it is possible that a
learning algorithm would “overfit” to the 1 hour of car noise. Thus, it could generalize poorly

to a new audio clip where the car noise happens to sound different.

Alternatively, suppose you have 1,000 unique hours of car noise, but all of it was taken from
just 10 different cars. In this case, it is possible for an algorithm to “overfit” to these 10 cars
and perform poorly if tested on audio from a different car. Unfortunately, these problems
can be hard to spot.

Page 82 Machine Learning Yearning-Draft Andrew Ng

77 78 79 80 81 82 83 84 85 86 87