Page 118 - Rapid Learning in Robotics
P. 118

104                                       Application Examples in the Vision Domain


                          (Kummert et al. 1993b).
                             To apply the PSOM-approach to this task requires a set of labeled train-
                          ing data (i.e., images with known 2 D-index finger tip coordinates) that
                          result from sampling the parameter space of the continuous image ensem-
                          ble on a 2 D-lattice. In the present case, we chose the subset of images
                          obtained when viewing each of four discrete hand postures (fully closed,
                          fully opened and two intermediate postures) from one of seven view direc-

                          tions (corresponding to rotations in    -steps about the arm axis) spanning
                          the full        -range. This yields the very manageable number of 28 images
                          in total, for which the location of the index finger tip was identified and
                          marked by a human observer.

                             Ideally, the dependency of the x- and y-coordinates of the finger tip
                          should be smooth functions of the resulting 9 image features. For real
                          images, various sources of noise (surface inhomogeneities, small specular
                          reflections, noise in the imaging system, limited accuracy in the labeling
                          process) lead to considerable deviations from this expectation and make
                          the corresponding interpolation task for the network much harder than it
                          would be if the expectation of smoothness were fulfilled. Although the
                          thresholding and the subsequent binarization help to reduce the influence
                          of these effects, compared to computing the feature vector directly from
                          the raw images, the resulting mapping still turns out to be very noisy. To
                          give an impression of the degree of noise, Fig. 7.7 shows the dependence
                          of horizontal (x-) finger tip location (plotted vertically) on two elements of
                          the 9 D-feature vector (plotted in the horizontal xy  plane). The resulting
                          mesh surface is a projection of the full 2 D-map-manifold that is embedded
                          in the space X, which here is of dimensionality 11 (nine dimensional input
                                            in
                          features space X , and a two dimensional output space X       out     x y    for
                          position.) As can be seen, the underlying “surface” does not appear very
                          smooth and is disrupted by considerable “wrinkles”.

                             To construct the PSOM, we used a subset 16 images of the image en-
                          semble by keeping the images seen from the two view directions at the

                          ends (    ) of the full orientation range, plus the eight pictures belonging

                          to view directions of     . For subsequent testing, we used the 12 images


                          from the remaining three view directions of   and     . I.e., both train-
                          ing and testing ensembles consisted of image views that were multiples of
                                 apart, and the directions of the test images are midway between the
                          directions of the training images.
   113   114   115   116   117   118   119   120   121   122   123