Page 142 - Introduction to Autonomous Mobile Robots
P. 142
127
Perception
that the odd rows are captured first, then afterward the even rows are captured. When such
a camera is used in dynamic environments, for example, on a moving robot, then adjacent
rows show the dynamic scene at two different time points, differing by up to one-thirtieth
of a second. The result is an artificial blurring due to motion and not optical defocus. By
comparing only even-numbered rows we avoid this interlacing side effect.
Recall that the three images are each taken with a camera using a different focus posi-
tion. Based on the focusing position, we call each image close, medium or far. A 5 x 3
coarse depth map of the scene is constructed quickly by simply comparing the sharpness
values of each of the three corresponding regions. Thus, the depth map assigns only two
bits of depth information to each region using the values close, medium, and far. The crit-
ical step is to adjust the focus positions of all three cameras so that flat ground in front of
the obstacle results in medium readings in one row of the depth map. Then, unexpected
readings of either close or far will indicate convex and concave obstacles respectively,
enabling basic obstacle avoidance in the vicinity of objects on the ground as well as drop-
offs into the ground.
Although sufficient for obstacle avoidance, the above depth from focus algorithm pre-
sents unsatisfyingly coarse range information. The alternative is depth from defocus, the
most desirable of the focus-based vision techniques.
Depth from defocus methods take as input two or more images of the same scene, taken
with different, known camera geometry. Given the images and the camera geometry set-
tings, the goal is to recover the depth information of the 3D scene represented by the
images. We begin by deriving the relationship between the actual scene properties (irradi-
ance and depth), camera geometry settings, and the image g that is formed at the image
plane.
The focused image f x y,( ) of a scene is defined as follows. Consider a pinhole aperture
p
(L = 0 ) in lieu of the lens. For every point at position xy,( ) on the image plane, draw
a line through the pinhole aperture to the corresponding, visible point P in the actual scene.
We define f x y,( ) as the irradiance (or light intensity) at due to the light from . Intu-
P
p
itively, f x y,( ) represents the intensity image of the scene perfectly in focus.
The point spread function hx y x y R,( g g ,,, xy ) is defined as the amount of irradiance
,
f
f
from point in the scene (corresponding to x y,( f f ) in the focused image that contributes
f
P
to point x y,( g g ) in the observed, defocused image . Note that the point spread function
g
depends not only upon the source, x y,( f f ) , and the target, x y,( g g ) , but also on , the blur
R
R
P
circle radius. , in turn, depends upon the distance from point to the lens, as can be seen
by studying equations (4.19) and (4.20).
Given the assumption that the blur circle is homogeneous in intensity, we can define h
as follows: