Page 147 - Dynamic Vision for Perception and Control of Motion
P. 147

5.2 Efficient Extraction of Oriented Edge Features      131


            size expected (domain knowledge) or known from previous images, thereby reduc-
            ing the computational load.
              It can be seen from Figure 5.4 that the same width at scaled distance 5 (depth in
            viewing direction, 31 pixels wide) is reduced to about 2 pixels at a scaled distance
            of 20. Clearly, features in the real world will change in appearance accordingly! In
            the stripe-based method to be discussed in Section 5.3, these facts may be taken
            into account by proper parameter specifica-
            tion  depending  on the actual situation en-  Crossroad, area-based
                                                    Measurement model
            countered.  At longer  distances, often the                  Distant
            center  of  homogeneous regions in a stripe                    road
            can be more reliably detected than the exact              Area-based
            position of edges belonging to the same re-
                                                                        detection
            gion. For example, crossroad detection as
            shown in Figure 5.5 is  much more robust
            based on area-based features than based on
            edge features since these may abound in the
            region along the road. Checking aggrega-      Local road, edge-based
            tions of homogeneous areas in road  direc-     measurement model
            tion, as needed in the next step for hypothe-  Figure 5.5. Crossroad detection in
            sis generation, is much more efficient than
                                                  look-ahead region further away.
            checking edge aggregations with their com-
            binatorial explosion.


            5.2 Efficient Extraction of Oriented Edge Features


            In general, multiple scales are advantageous in image processing to optimally ex-
            ploit information in noise corrupted images [Florack et al. 1992]. To avoid spurious
            details, the finest scale should be much larger than pixel size; a mask size of three
            or four pixels is considered a practical lower bound for the extraction of linearly
            extended features. The upper scale limit is given by image size, since during the
            search process, no image boundary should be hit; a maximum mask size of one-
            half or one-third of the image size seems to be a meaningful upper limit. Spacing
            mask sizes by a factor of 2 then results in seven or eight mask sizes for 1K×1K
            pixel images.
              An alternative approach to working with large mask sizes is to compute pyramid
            images [Burt et al. 1981] by averaging four neighboring pixels on the ith level to one
            pixel at the (i+1)th pyramid level and then applying a smaller  mask size to the
            higher level image. Note however, that these two operations are not exactly the
            same, since in the latter case resolution in the image plane has been lost. Therefore,
            to preserve high resolution at the small scale, mask sizes up to 17 pixels in width
            have been generated and implemented. Masks of larger size have not been neces-
            sary in the problem areas treated up to now. If they are needed, either four or five
            pyramid levels have to  be generated or,  for the sake  of  saving computing time,
            simple sub-sampling may be done, substituting one intensity value of level i for the
            mean of four neighboring ones on level ií1 in the exact pyramid. Thus, about half
   142   143   144   145   146   147   148   149   150   151   152