Page 118 - Rapid Learning in Robotics
P. 118
104 Application Examples in the Vision Domain
(Kummert et al. 1993b).
To apply the PSOM-approach to this task requires a set of labeled train-
ing data (i.e., images with known 2 D-index finger tip coordinates) that
result from sampling the parameter space of the continuous image ensem-
ble on a 2 D-lattice. In the present case, we chose the subset of images
obtained when viewing each of four discrete hand postures (fully closed,
fully opened and two intermediate postures) from one of seven view direc-
tions (corresponding to rotations in -steps about the arm axis) spanning
the full -range. This yields the very manageable number of 28 images
in total, for which the location of the index finger tip was identified and
marked by a human observer.
Ideally, the dependency of the x- and y-coordinates of the finger tip
should be smooth functions of the resulting 9 image features. For real
images, various sources of noise (surface inhomogeneities, small specular
reflections, noise in the imaging system, limited accuracy in the labeling
process) lead to considerable deviations from this expectation and make
the corresponding interpolation task for the network much harder than it
would be if the expectation of smoothness were fulfilled. Although the
thresholding and the subsequent binarization help to reduce the influence
of these effects, compared to computing the feature vector directly from
the raw images, the resulting mapping still turns out to be very noisy. To
give an impression of the degree of noise, Fig. 7.7 shows the dependence
of horizontal (x-) finger tip location (plotted vertically) on two elements of
the 9 D-feature vector (plotted in the horizontal xy plane). The resulting
mesh surface is a projection of the full 2 D-map-manifold that is embedded
in the space X, which here is of dimensionality 11 (nine dimensional input
in
features space X , and a two dimensional output space X out x y for
position.) As can be seen, the underlying “surface” does not appear very
smooth and is disrupted by considerable “wrinkles”.
To construct the PSOM, we used a subset 16 images of the image en-
semble by keeping the images seen from the two view directions at the
ends ( ) of the full orientation range, plus the eight pictures belonging
to view directions of . For subsequent testing, we used the 12 images
from the remaining three view directions of and . I.e., both train-
ing and testing ensembles consisted of image views that were multiples of
apart, and the directions of the test images are midway between the
directions of the training images.