Page 203 - Biomimetics : Biologically Inspired Technologies
P. 203

Bar-Cohen : Biomimetics: Biologically Inspired Technologies DK3163_c006 Final Proof page 189 21.9.2005 2:56am




                    Robotic Mechanisms                                                          189

                      Between recognition and synthesis, an intelligent system needs to process language, cross-relate
                    language to vision and other senses (a task known as multimodal sensor fusion), and make decisions
                    about how to act in this world. Many labs tackle this problem with natural language as the nexus of
                    the above, an approach known as NLP.
                      Some NLP researchers ambitiously attempt to completely model human grammars, while others
                    such as the Cyc project of Austin, Texas model ontological relationships into expert systems — an
                    approach that has proven successful for some limited applications. Many functional natural
                    language applications, such as electronic ticketing agents or IBM’s Natural Language Assistant
                    (NLA) search engine (Chai et al., 2002), compensate for their inability to understand full, general
                    language by relying on the constraints specific to the application’s situations. Other ambitious
                    language-engine projects attempt to model the emergence of language — the paths by which one (a
                    human or a machine) can acquire language from a social environment.
                      Under the hypothesis that language is inherently an emergent phenomenon, Luc Steels and other
                    researchers at the Sony Computer Science Lab in Paris are teaching Sony AIBO robots to recognize
                    objects via natural language games (Steels and Kaplan, 2002; Boyd, 2002). The results are
                    promising. While these robots are learning only the simplest of grammars and words, they are
                    doing so under highly variable conditions, and can recognize learned objects independent of
                    lighting or viewing angle. In fact, this method has considerably outperformed other language
                    acquisition systems that used neural networks or symbolic learning labeling theory (Steels and
                    Kaplan, 2002). Here emphasis is made that such a natural language system is an integration of many
                    cognitive components: vision, gesturing, pattern recognition, speech analysis and synthesis, con-
                    ceptualization, interpretation, behavioral recognition, action, etc.

                    6.3.1.2 Vision, Other Sensing, Sensor Fusion

                    TheworkofcomputationalneuroscientistCristophvonderMalsberg’stheoriesofcomplex,nonlinear
                    behavior in neurons has driven the development of numerous successful vision algorithms (Von der
                    Malsberg and Schneider, 1986). One descendant of Von der Malsberg’s work developed by Mals-
                    berg’s student Hartmut Nevin, stands out as the most successful tracker of human facial expressions
                    from live streaming video is sold as NevenVision FFT. NevenVision modules use these theories to
                    accomplish numerous other vision tasks as well, including biometric face recognition, object, and
                    gesturerecognitionaswell.Theauthorofthischapteriscurrentlyinvestigatingtheuseofthissoftware
                    to endow social robots with emotional-expression recognition in context-driven conversation.
                      The automated face analysis (AFA) software system developed in the Carnegie Mellon Univer-
                    sity Face Lab determines the emotional state of a subject by automatically analyzing images against
                    Ekman’s facial action coding system (FACS) (Xiao et al., 2002). While this AFA FACS analysis is
                    not in real time, if optimized and integrated with quick and robust expression recognition software,
                    this software will greatly advance progress toward complete and effective sociable robot systems.
                      Using the work of Steels and Kaplan (2002) described in section 6.3.1.1 and others, Sony has
                    demonstrated the integration of many visual and perceptual systems and speech in its Qrio biped.
                    The Qrio can biometrically identify a face, recognize, and respond to a person’s facial expressions,
                    and recognize objects and environmental attributes. The visual ontologies are fused with the
                    semantic language ontologies, allowing Qrio to converse in a simple but lifelike way about a
                    number of subjects. This work is a forerunner of integrated machine intelligence systems with
                    nimble humanlike embodiment.

                    6.3.2 Social Intelligence, Social Robots, and Robot Visual Identity

                    Social robots particularly require the fusion of many perceptual, language, and physical embodi-
                    ment systems — requirements that drive the systematic integration of these components into a
                    whole that is greater than the sum of parts (Breazeal, 2002).
   198   199   200   201   202   203   204   205   206   207   208