Page 332 -
P. 332
11.5 Analyzing multimedia content 321
specific interest. This process can be extremely time-consuming, tedious, and in
many cases, impractical.
The basic guidelines for analyzing text content also apply to multimedia content.
Before you start analyzing the data, you need to study the literature and think about
the scope, context, and objective of your study. You need to identify the key instances
that you want to describe or annotate. After the analysis, you need to evaluate the
reliability of the annotation. If a manual annotation approach is adopted, it may be a
good idea to select a subset of the entire data set for analysis due to high labor cost.
For example, Peltonen et al. (2008) picked eight days of data from a study that lasted
for 1 month. They first automatically partitioned the video footage into small “ses-
sions,” then manually coded the information in which they were interested (the dura-
tion of interaction, the number of active users, and the number of passive bystanders).
Another application domain related to multimedia content analysis is the online
search of media content. There is a huge amount of images, videos, and audios on
the web. Users frequently go online to search for images, videos, or audio materials.
Currently, most multimedia search is completed by text-based retrieval, which means
that the multimedia materials have to be annotated or labeled with appropriate text.
So far, annotation can be accomplished through three approaches: manual annota-
tion, partially automated annotation, and completely automated annotation.
Considering the huge amount of information that needs to be annotated, the
manual approach is extremely labor intensive. In addition, it can also be affected
by the coder's subjective interpretation. The completely automated approach is less
labor intensive. However, due to the substantial semantic gap between the low-level
features that we can currently automatically extract and the high-level concepts that
are of real interest to the user, existing automatic annotation applications are highly
error prone (i.e., many images that have nothing to do with cats may be annotated
with “cat” using this automatic annotation). A more recent development in this field
is the partially automated approach. Human coders manually annotate a subset of
the multimedia data. Then the manually coded data is used to train the application to
establish the connection between the low-level features and the high-level concept.
Once a concept detector is established, the detector can be used to automatically an-
notate the rest of the data (Rui and Qi, 2007). The same approach can be applied to
images and video and audio clips.
The techniques for multimedia content analysis are built on top of multiple do-
mains including image processing, computer vision, pattern recognition and graph-
ics. One of the commonly adopted approaches used by all those fields is machine
learning. The specific algorithms or techniques of multimedia content analysis are
still seeing dramatic advances. For more detailed information on those topics, see
publications in the related fields (Hanjalic et al., 2006; Sebe et al., 2007; Divakaran,
2009; Ohm, 2016). The specific applications that are particularly interesting to the
HCI field include action recognition and motion tracking (Zhu et al., 2006; Vondrak
et al., 2012), body tracking (Li et al., 2006), face recognition, facial expression analy-
sis (Wu et al., 2006; Wolf et al., 2016), gesture recognition (Argyros and Lourakis,
2006), object classification and tracking (Dedeoğlu et al., 2006; Guo et al., 2015),