Page 176 - Socially Intelligent Agents Creating Relationships with Computers and Robots
P. 176
Infanoid 159
— are arranged in a 480-mm-tall upper body. Infanoid is mounted on a table
for face-to-face interaction with a human caregiver sitting on a chair.
Infanoid has a foveated stereo vision head, as shown in Figure 19.1 (right).
Each of the eyes has two color CCD cameras like those of Cog [3]; the lower
one has a wide angle lens that spans the visual field (about 120 degrees horizon-
tally), and the upper one has a telephoto lens that takes a close-up image on the
fovea (about 20 degrees horizontally). Three motors drive the eyes, controlling
their direction (pan and common tilt). The motors also help the eyes to per-
form a saccade of over 45 degrees within 100 msec, as well as smooth pursuit
of visual targets. The images from the cameras are fed into massively paral-
lel image processors (IMAP Vision) for facial and non-facial feature tracking,
which enables real-time attentional interaction with the interlocutor and with a
third object. In addition, the head has eyebrows with 2 DOFs and lips with 2
DOFs for natural facial expressions and lip-synching with vocalizations. Each
DOF is controlled by interconnected MCUs; high-level sensori-motor infor-
mation is processed by a cluster of Linux PCs.
Infanoid has been equipped with the following functions: (1) tracking a
nonspecific human face in a cluttered background; (2) determining roughly the
direction of the human face being tracked; (3) tracking objects with salient
color and texture, e.g., toys; (4) pointing to or reaching out for an object or a
face by using the arms and torso; (5) gazing alternately between the face and
the object; and (6) vocalizing canonical babbling with lip-synching. Currently,
we are working on modules for gaze tracking, imperfect verbal imitation, and
so on, in order to provide Infanoid with the basic physical skills of 6-to-9-
month-olds, as an initial stage for social and communicative development.
3. Being intentional
Communication is the act of sending and receiving physical signals from
which the receiver derives the sender’s intention to manifest something in the
environment (or in the memory) so as to change the receiver’s attention and/
or behavioral disposition [8]. This enables us to predict and control others’
behavior to some degree for efficient cooperation and competition with others.
It is easy to imagine that our species acquired this skill, probably prior to the
emergence of symbolic language, as a result of the long history of the struggle
for existence.
How do we derive intangible intentions from physically observable behav-
ior of others? We do that by using empathy, i.e. the act of imagining oneself
in the position of someone else, thereby understanding how he or she feels
and acts, as illustrated in Figure 19.2. This empathetic process arouses in our
mind, probably unconsciously, a mental state similar to that of the interlocu-
tor. But, how can a robot do this? As well as being able to identify itself