Page 117 - Rapid Learning in Robotics

P. 117

7.3 Low Level Vision Domain: a Finger Tip Location Finder 103

Figure 7.5: Left,(a): Typical input image. Upper Right,(b): after thresholding and
binarization. Lower Right,(c): position of array of Gaussian masks (the dis-
played width is the actual width reduced by a factor of four in order to better
depict the position arrangement)

maps a monocular image from this ensemble to the 2 D-position of the
index ﬁnger tip in the image.

In order to have reproducible conditions, the images were generated
with the aid of an adjustable wooden hand replica in front of a black back-
ground (for the required segmentation to achieve such condition for more
realistic backgrounds, see e.g. Kummert et al. 1993a; Kummert et al.
1993b). A typical image ( pixel resolution) is shown in Fig. 7.5a.
From the monochrome pixel image, we generated a 9-dimensional feature
vector ﬁrst by thresholding and binarizing the pixel values (threshold =
20, 8-bit intensity values), and then by computing as image features the
scalar product of the resulting binarized images (shown in Fig. 7.5b) with
a grid of 9 Gaussians at the vertices of a lattice centered on the hand
(Fig. 7.5c). The choice of this preprocessing method is partly heuristically
motivated (the binarization makes the feature vector more insensitive to
variations of the illumination), and partly based on good results achieved
with a similar method in the context of the recognition of hand postures

112 113 114 115 116 117 118 119 120 121 122