Page 312 - Concise Encyclopedia of Robotics
P. 312

Speech Synthesis
                            (www.google.com)  or  a  similar  search  engine. Related  entries  include
                            BANDWIDTH,  CONTEXT,  DATA CONVERSION,  DIGITAL SIGNAL PROCESSING,  MESSAGE PASSING,
                            OPTICAL CHARACTER RECOGNITION, PROSODIC FEATURES, SOUND TRANSDUCER, SPEECH SYNTHESIS,
                            and SYNTAX.
                         SPEECH SYNTHESIS
                            Speech synthesis, also called voice synthesis, is the electronic generation of
                            sounds that mimic the human voice. These sounds can be generated from
                            digital text or from printed documents. Speech can also be generated by
                            high-level computers that have artificial intelligence (AI), in the form of
                            responses to stimuli or input from humans or other machines.
                            What is a voice?
                            All audible sounds consist of combinations of alternating-current (AC)
                            waves within the frequency range from 20 Hz to 20 kHz. (A frequency
                            of 1 Hz is one cycle per second; 1 kHz = 1000 Hz.) These take the form of
                            vibrations in air molecules. The patterns of vibration can be duplicated
                            as electric currents.
                              A frequency band of 300 to 3000 Hz is wide enough to convey all the
                            information, and also all of the emotional content, in any person’s voice.
                            Therefore, speech  synthesizers  only  need  to  make  sounds  within  the
                            range from 300 to 3000 Hz. The challenge is to produce waves at exactly
                            the right frequencies, at the right times, and in the right phase combina-
                            tions. The modulation must also be correct, so the intended meaning is
                            conveyed. In the human voice, the volume and frequency rise and fall in
                            subtle and precise ways. The slightest change in modulation can make a
                            tremendous difference in the meaning of what is said. You can tell, even
                            over the telephone, whether the speaker is anxious, angry, or relaxed. A
                            request sounds different than a command. A question sounds different
                            than a declarative statement, even if the words are the same.
                            Tone of voice
                            In  the  English  language  there  are  40  elementary  sounds, known  as
                            phonemes. In some languages there are more phonemes than in English;
                            some languages have fewer phonemes. The exact sound of a phoneme
                            can vary, depending on what comes before and after it. These variations
                            are called allophones. There are 128 allophones in English. These can be
                            strung together in myriad ways.
                              The inflection, or “tone of voice,” is another variable in speech; it
                            depends on whether the speaker is angry, sad, scared, happy, or indifferent.
                            These depend not only on the actual feelings of the speaker, but on age,
                            gender, upbringing, and other factors. A voice can also have an accent.




                                                   
   307   308   309   310   311   312   313   314   315   316   317