Page 107 - Master Handbook of Acoustics
P. 107

FIGURE 5-3   (A) The human voice is produced through the interaction of two essentially independent

   components: a sound source, and a time-varying-filter action of the vocal tract. (B) The sound source
   is composed of vocal-cord vibration for voiced sounds, the fricative sounds resulting from air
   turbulence, and plosive sounds. (C) A digital system used to synthesize human speech.


      The second source of speech sound is that made by forming a constriction at some point in the
  vocal tract with the teeth, tongue, or lips, and forcing air through it under high enough pressure to
  produce significant turbulence. Turbulent air creates noise. This noise is shaped by the vocal tract to
  form the fricative sounds of speech such as the consonants f, s, v, and z. Try making these sounds, and
  you will see that high-velocity air is very much involved.

      The third source of speech sound is produced by the complete stoppage of the breath, usually
  toward the front of the mouth, a building up of the pressure, and then the sudden release of the breath.
  Try speaking the consonants k, p, and t, and you will sense the force of such plosive sounds. They are
  usually followed by a burst of fricative or turbulent sound. These three types of sounds—voiced,
  fricative, and plosive—are the raw sources that are shaped into the words we speak.

      Sound sources and signal processing can be implemented in digital hardware or software. A
  simple speech synthesis system is shown in Fig. 5-3C. A random-number generator produces the
  digital equivalent of s-like sounds for the unvoiced components. A counter produces pulses
  simulating the pulses of sound of the vocal cords for the voiced components. These are shaped by
  time-varying digital filters simulating the varying resonances of the vocal tract. Signals control each
  of these to form digitized speech, which is then converted to analog form.




  Vocal Tract Molding of Speech

  The vocal tract can be considered as an acoustically resonant system. This tract, from the vocal cords
  to the lips, has a length of about 6.7 in (17 cm). Its cross-sectional area is determined by the
  placement of the lips, jaw, tongue, and velum (a sort of trapdoor that can open or close off the nasal
                                                         2
                                               2
  cavity) and varies from 0 to about 3 in  (20 cm ). The nasal cavity has a length of about 4.7 in (12 cm)
                                                  3
                                        3
  and has a volume of about 3.7 in  (60 cm ). These dimensions help determine the resonances of the
  vocal tract and their effect on speech sounds.


  Formation of Voiced Sounds

  If the components of Fig. 5-3 are elaborated into source spectra and modulating functions, we arrive
  at something of great importance in audio—the spectral distribution of energy in the voice. We also

  gain a better understanding of the aspects of voice sounds that contribute to the intelligibility of
  speech in the presence of reverberation and noise. Figure 5-4 shows the steps in producing voiced
  sounds. First, sound is produced by the vibration of the vocal cords; these are pulses of sound with a
  fine spectrum that falls off at about 10 dB/octave as frequency is increased, as shown in Fig. 5-4A.
  The sounds of the vocal cords pass through the vocal tract, which acts as a time-varying filter. The

  peaks in the contour of Fig. 5-4B are due to the acoustical resonances, called formants of the vocal
  tract, which acts as a pipe that is essentially closed at the vocal cord end and open at the mouth end.
  Such an acoustical pipe 6.7 in long has resonances at odd quarter wavelengths, and these peaks occur
  at approximately 500; 1,500; and 2,500 Hz. The output sound, shaped by the resonances of the vocal
  tract, is shown in Fig. 5-4C. This analysis applies to the voiced sounds of speech.
   102   103   104   105   106   107   108   109   110   111   112