Page 117 - Rapid Learning in Robotics
P. 117

7.3 Low Level Vision Domain: a Finger Tip Location Finder                              103































                 Figure 7.5: Left,(a): Typical input image. Upper Right,(b): after thresholding and
                 binarization. Lower Right,(c): position of     array of Gaussian masks (the dis-
                 played width is the actual width reduced by a factor of four in order to better
                 depict the position arrangement)





                 maps a monocular image from this ensemble to the 2 D-position of the
                 index finger tip in the image.

                     In order to have reproducible conditions, the images were generated
                 with the aid of an adjustable wooden hand replica in front of a black back-
                 ground (for the required segmentation to achieve such condition for more
                 realistic backgrounds, see e.g. Kummert et al. 1993a; Kummert et al.
                 1993b). A typical image (        pixel resolution) is shown in Fig. 7.5a.
                 From the monochrome pixel image, we generated a 9-dimensional feature
                 vector first by thresholding and binarizing the pixel values (threshold =
                 20, 8-bit intensity values), and then by computing as image features the
                 scalar product of the resulting binarized images (shown in Fig. 7.5b) with
                 a grid of 9 Gaussians at the vertices of a     lattice centered on the hand
                 (Fig. 7.5c). The choice of this preprocessing method is partly heuristically
                 motivated (the binarization makes the feature vector more insensitive to
                 variations of the illumination), and partly based on good results achieved
                 with a similar method in the context of the recognition of hand postures
   112   113   114   115   116   117   118   119   120   121   122