Page 214 - Foundations of Cognitive Psychology : Core Readings
P. 214

The Auditory Scene  219
























               Figure 9.4
               A spectrogram of a mixture of sounds (containing the word ‘‘shoe’’).


               what like a picture created by making a spectrogram of each of the individual
               sounds on a separate piece of transparent plastic, and then overlaying the in-
               dividual spectrograms to create a composite. The spectrogram of the word shoe
               is actually oneofthe componentspectrogramsofthe mixture.
                 Although the theorist has the privilege of building the composite up from the
               pictures of its components, the auditory system, or any machine trying to imitate
               it, would be presented only with the spectrogram of the mixture and would
               have to try to infer the set of pictures that was overlaid to produce it.
                 The recognizer would have to solve the following problems: How many
               sources have created the mixture? Is a particular discontinuity in the picture
               a change in one sound or an interruption by a second one? Should two dark
               regions, one above the other in the picture (in other words, occurring at the
               same time), be grouped as a single sound with a complex timbre or separated
               to represent two simultaneous sounds with simpler timbres? We can see that if
               we look at a spectrogram representing a slice of real life, we would see a com-
               plex pattern of streaks, any pair of which could have been caused by the same
               acoustic event or by different ones. A single streak could have been the sum-
               mation of one, two, or even more parts of different sounds. Furthermore, the
               frequency components from one source could be interlaced with those of an-
               other one; just because one horizontal streak happens to be immediately above
               another, it does not mean that they both arose from the same sonic event.
                 We can see that just as in the visual problem of recognizing a picture of
               blocks, there is a serious need for regions to be grouped appropriately. Again,
               it wouldbeconvenienttobeable tohandthe spectrogramovertoamachine
               that did the equivalent of taking a set of crayons and coloring in, with the same
               color, all the regions on the spectrogram that came from the same source. This
               ‘‘coloring problem’’ or ‘‘auditory scene analysis problem’’ is what the rest of
               this chapter is about.
   209   210   211   212   213   214   215   216   217   218   219