Page 237 - Designing Sociable Robots
P. 237
breazeal-79017 book March 18, 2002 14:20
218 Chapter 12
Post-Attentive Processing
Once the attention system has selected regions of the visual field that are potentially be-
haviorally relevant, more intensive computation can be applied to these regions than could
be applied across the whole field. Searching for eyes is one such task. Locating eyes is
important to us for engaging in eye contact, and as a reference point for interpreting facial
movements and expressions. We currently search for eyes after the robot directs its gaze to
a locus of attention, so that a relatively high resolution image of the area being searched
is available from the foveal cameras (recall chapter 6). Once the target of interest has been
selected, its proximity to the robot is estimated using a stereo match between the two central
wide cameras (also discussed in chapter 6). Proximity is important for interaction as things
closer to the robot should be of greater interest. It’s also useful for interaction at a distance,
such as a person standing too far away for face-to-face interaction but close enough to
be beckoned closer. Clearly the relevant behavior (calling or playing) is dependent on the
proximity of the human to the robot.
Eye Movements
Figure 12.4 shows the organization of Kismet’s eye/neck motor control. Kismet’s eyes
periodically saccade to new targets chosen by an attention system, tracking them smoothly
if they move and the robot wishes to engage them. Vergence eye movements are more
challenging to implement in a social setting, since errors in disjunctive eye movements
can give the eyes a disturbing appearance of moving independently. Errors in conjunctive
movements have a much smaller impact on an observer, since the eyes clearly move in lock-
step. A crude approximation of the opto-kinetic reflex is rolled into the implementation
of smooth pursuit. Kismet uses an efferent copy mechanism to compensate the eyes for
movements of the head.
The attention system operates on the view from the central camera. A transformation
is needed to convert pixel coordinates in images from this camera into position set-points
for the eye motors. This transformation in general requires the distance to the target to be
known, since objects in many locations will project to the same point in a single image (see
figure 12.5). Distance estimates are often noisy, which is problematic if the goal is to center
the target exactly in the eyes. In practice, it is usually enough to get the target within the field
of view of the foveal cameras in the eyes. Clearly, the narrower the field of view of these
cameras, the more accurately the distance to the object needs to be known. Other crucial
factors are the distance between the wide and foveal cameras, and the closest distance at
which the robot will need to interact with objects. These constraints are determined by the
physical distribution of Kismet’s cameras and the choice of lenses. The central location of
the wide camera places it as close as possible to the foveal cameras. It also has the advantage
that moving the head to center a target in the central camera will in fact truly orient the head

