Page 271 -
P. 271
250 5 Segmentation
(a) (b) (c) (d)
Figure 5.12 Keyframe-based rotoscoping (Agarwala, Hertzmann, Seitz et al. 2004) c 2004 ACM: (a) original
frames; (b) rotoscoped contours; (c) re-colored blouse; (d) rotoscoped hand-drawn animation.
ters 1995; Parke and Waters 1996; Bregler, Covell, and Slaney 1997) (Figure 5.2b). They can
also be used to track heads and people, as shown in Figure 5.8, as well as moving vehicles
(Paragios and Deriche 2000). Additional applications include medical image segmentation,
where contours can be tracked from slice to slice in computerized tomography (3D medical
imagery) (Cootes and Taylor 2001) or over time, as in ultrasound scans.
An interesting application that is closer to computer animation and visual effects is ro-
toscoping, which uses the tracked contours to deform a set of hand-drawn animations (or
7
to modify or replace the original video frames). Agarwala, Hertzmann, Seitz et al. (2004)
present a system based on tracking hand-drawn B-spline contours drawn at selected keyframes,
using a combination of geometric and appearance-based criteria (Figure 5.12). They also pro-
vide an excellent review of previous rotoscoping and image-based, contour-tracking systems.
Additional applications of rotoscoping (object contour detection and segmentation), such
as cutting and pasting objects from one photograph into another, are presented in Section 10.4.
5.2 Split and merge
As mentioned in the introduction to this chapter, the simplest possible technique for seg-
menting a grayscale image is to select a threshold and then compute connected components
(Section 3.3.2). Unfortunately, a single threshold is rarely sufficient for the whole image
because of lighting and intra-object statistical variations.
In this section, we describe a number of algorithms that proceed either by recursively
splitting the whole image into pieces based on region statistics or, conversely, merging pixels
and regions together in a hierarchical fashion. It is also possible to combine both splitting and
merging by starting with a medium-grain segmentation (in a quadtree representation) and
7 The term comes from a device (a rotoscope) that projected frames of a live-action film underneath an acetate so
that artists could draw animations directly over the actors’ shapes.