Page 176 - Socially Intelligent Agents Creating Relationships with Computers and Robots
P. 176

Infanoid                                                         159

                              — are arranged in a 480-mm-tall upper body. Infanoid is mounted on a table
                              for face-to-face interaction with a human caregiver sitting on a chair.
                                Infanoid has a foveated stereo vision head, as shown in Figure 19.1 (right).
                              Each of the eyes has two color CCD cameras like those of Cog [3]; the lower
                              one has a wide angle lens that spans the visual field (about 120 degrees horizon-
                              tally), and the upper one has a telephoto lens that takes a close-up image on the
                              fovea (about 20 degrees horizontally). Three motors drive the eyes, controlling
                              their direction (pan and common tilt). The motors also help the eyes to per-
                              form a saccade of over 45 degrees within 100 msec, as well as smooth pursuit
                              of visual targets. The images from the cameras are fed into massively paral-
                              lel image processors (IMAP Vision) for facial and non-facial feature tracking,
                              which enables real-time attentional interaction with the interlocutor and with a
                              third object. In addition, the head has eyebrows with 2 DOFs and lips with 2
                              DOFs for natural facial expressions and lip-synching with vocalizations. Each
                              DOF is controlled by interconnected MCUs; high-level sensori-motor infor-
                              mation is processed by a cluster of Linux PCs.
                                Infanoid has been equipped with the following functions: (1) tracking a
                              nonspecific human face in a cluttered background; (2) determining roughly the
                              direction of the human face being tracked; (3) tracking objects with salient
                              color and texture, e.g., toys; (4) pointing to or reaching out for an object or a
                              face by using the arms and torso; (5) gazing alternately between the face and
                              the object; and (6) vocalizing canonical babbling with lip-synching. Currently,
                              we are working on modules for gaze tracking, imperfect verbal imitation, and
                              so on, in order to provide Infanoid with the basic physical skills of 6-to-9-
                              month-olds, as an initial stage for social and communicative development.

                              3.     Being intentional
                                Communication is the act of sending and receiving physical signals from
                              which the receiver derives the sender’s intention to manifest something in the
                              environment (or in the memory) so as to change the receiver’s attention and/
                              or behavioral disposition [8]. This enables us to predict and control others’
                              behavior to some degree for efficient cooperation and competition with others.
                              It is easy to imagine that our species acquired this skill, probably prior to the
                              emergence of symbolic language, as a result of the long history of the struggle
                              for existence.
                                How do we derive intangible intentions from physically observable behav-
                              ior of others? We do that by using empathy, i.e. the act of imagining oneself
                              in the position of someone else, thereby understanding how he or she feels
                              and acts, as illustrated in Figure 19.2. This empathetic process arouses in our
                              mind, probably unconsciously, a mental state similar to that of the interlocu-
                              tor. But, how can a robot do this? As well as being able to identify itself
   171   172   173   174   175   176   177   178   179   180   181