Page 88 - Designing Sociable Robots
P. 88
breazeal-79017 book March 18, 2002 14:2
The Vision System 69
activation level of the location currently being attended to, strengthening bias toward other
locations of lesser activation.
The habituation function can be viewed as a feature map that initially maintains eye
fixation by increasing the saliency of the center of the field of view and then slowly decays
the saliency values of central objects until a salient off-center object causes the neck to
move. The habituation function is a Gaussian field G(x, y) centered in the field of view
with peak amplitude of 255 (to remain consistent with the other 8-bit values) and θ = 50
pixels. It is combined linearly with the other feature maps using the weight
w = W · max(−1, 1 − t/τ) (6.7)
where w is the weight, t is the time since the last habituation reset, τ is a time constant, and
W is the maximum habituation gain. Whenever the neck moves, the habituation function
is reset, forcing w to W and amplifying the saliency of central objects until a time τ when
w = 0 and there is no influence from the habituation map. As time progresses, w decays
to a minimum value of −W which suppresses the saliency of central objects. In the current
implementation, a value of W = 10 and a time constant τ = 5 seconds is used. When the
robot’s neck shifts, the habituation map is reset, allowing that region to be revisited after
some period of time.
6.2 Post-Attentive Processing
Once the attention system has selected regions of the visual field that are potentially be-
haviorally relevant, more intensive computation can be applied to these regions than could
be applied across the whole field. Searching for eyes is one such task. Locating eyes is
important to us for engaging in eye contact. Eyes are searched for after the robot directs
its gaze to a locus of attention. By doing so, a relatively high-resolution image of the area
being searched is available from the narrow FoV cameras (see figure 6.5).
Once the target of interest has been selected, its proximity to the robot is estimated using
a stereo match between the two central wide FoV cameras. Proximity is an important factor
for interaction. Things closer to the robot should be of greater interest. It is also useful for
interaction at a distance. For instance, a person standing too far from Kismet for face-to-
face interaction may be close enough to be beckoned closer. Clearly the relevant behavior
(beckoning or playing) is dependent on the proximity of the human to the robot.
Eye detection Detecting people’s eyes in a real-time robotic domain is computationally
expensive and prone to error due to the large variance in head posture, lighting conditions
and feature scales. Aaron Edsinger developed an approach based on successive feature
extraction, combined with some inherent domain constraints, to achieve a robust and fast

