Page 151 - Dynamic Vision for Perception and Control of Motion

P. 151

5.2 Efficient Extraction of Oriented Edge Features 135

ference is largest in amplitude, (a) Masks characterized by: (n d n 0 n d )
the gradient over two consecu-
tive mask elements is maxi-
m d = 2; m d = 3; m d = 5; m d = 8
mal.
m d = 17 (Total mask depth)
b) (b) 7 3 7
However, due to local per-
turbations, this need not corre-
(c)
spond to an actual extreme
gradient on the scale of inter-
est. Experience with images
from natural environments has
shown that two additional pa-
rameters may considerably
improve the results obtained: Figure 5.10. Efficient mask evaluation with the
1. By allowing a yet to be “Colsum”-vector; the n d -values given are typical
specified number n 0 of for sizes of “receptive fields” formed
entries in the mask center
to be dropped, the results achieved may be more robust. This can be
immediately appreciated when taking into account that either the actual edge
direction may deviate from the mask orientation used or the edge is not
straight but curved; by setting central elements of the mask to zero, the
extreme intensity gradient becomes more pronounced. The rest of Figure 5.10
shows typical mask parameters with n 0 = 1 for masks three and five pixels in
depth (m d = 3 or 5), and with n 0 = 2 for m d = 8 as well as n 0 = 3 for m d = 17
(rows b, c).
2. Local perturbations are suppressed by assigning to the mask a significant
depth n d, which designates the number of pixels along the search path in each
row or column in each positive and negative field. The total mask depth m d
then is m d = 2 n d + n 0. Figure 5.10 shows the corresponding mask schemes. In
line (b) a rather large mask for finding the transition between relatively large
homogeneous areas with ragged boundaries is given (m d = 17 pixels wide and
each field with seven elements, so that the correlation value is formed from
large averages; for a mask width n w of 17 pixels, the correlation value is
formed from 7·17 = 119 pixels). With the number of zero-values in between
chosen as n 0 = 3, the total receptive field (= mask) size is 17·17 = 289 pixels.
The sum formed from n d mask elements (vector values “ColSum”) divided by
(n w· n d) represents the average intensity value in the oblique image region
adjacent to the edge. At the maximum correlation value found, this is the
average gray value on one side of the edge. This information may be used for
recognizing a specific edge feature in consecutive images or for grouping
edges in a scene context.
For larger mask depths, it is more efficient when shifting the mask along the
search direction, to subtract the last mask element (ColSum-value) from the
summed field intensities and add the next one at the front in the search direction,
see line (c) in Figure 5.10); the number of operations needed is much lower than
for summing all ColSum elements anew in each field.
The optimal value of these additional mask parameters n d and n 0 as well as the
mask width n w depend on the scene at hand and are considered knowledge gained

146 147 148 149 150 151 152 153 154 155 156