Page 332 -
P. 332

11.5  Analyzing multimedia content  321




                    specific interest. This process can be extremely time-consuming, tedious, and in
                  many cases, impractical.
                     The basic guidelines for analyzing text content also apply to multimedia content.
                  Before you start analyzing the data, you need to study the literature and think about
                  the scope, context, and objective of your study. You need to identify the key instances
                  that you want to describe or annotate. After the analysis, you need to evaluate the
                  reliability of the annotation. If a manual annotation approach is adopted, it may be a
                  good idea to select a subset of the entire data set for analysis due to high labor cost.
                  For example, Peltonen et al. (2008) picked eight days of data from a study that lasted
                  for 1 month. They first automatically partitioned the video footage into small “ses-
                  sions,” then manually coded the information in which they were interested (the dura-
                  tion of interaction, the number of active users, and the number of passive bystanders).
                     Another application domain related to multimedia content analysis is the online
                  search of media content. There is a huge amount of images, videos, and audios on
                  the web. Users frequently go online to search for images, videos, or audio materials.
                  Currently, most multimedia search is completed by text-based retrieval, which means
                  that the multimedia materials have to be annotated or labeled with appropriate text.
                  So far, annotation can be accomplished through three approaches: manual annota-
                  tion, partially automated annotation, and completely automated annotation.
                     Considering the huge amount of information that needs to be annotated, the
                  manual approach is extremely labor intensive. In addition, it can also be affected
                  by the coder's subjective interpretation. The completely automated approach is less
                  labor intensive. However, due to the substantial semantic gap between the low-level
                  features that we can currently automatically extract and the high-level concepts that
                  are of real interest to the user, existing automatic annotation applications are highly
                  error prone (i.e., many images that have nothing to do with cats may be annotated
                  with “cat” using this automatic annotation). A more recent development in this field
                  is the partially automated approach. Human coders manually annotate a subset of
                  the multimedia data. Then the manually coded data is used to train the application to
                  establish the connection between the low-level features and the high-level concept.
                  Once a concept detector is established, the detector can be used to automatically an-
                  notate the rest of the data (Rui and Qi, 2007). The same approach can be applied to
                  images and video and audio clips.
                     The techniques for multimedia content analysis are built on top of multiple do-
                  mains including image processing, computer vision, pattern recognition and graph-
                  ics. One of the commonly adopted approaches used by all those fields is machine
                  learning. The specific algorithms or techniques of multimedia content analysis are
                  still seeing dramatic advances. For more detailed information on those topics, see
                  publications in the related fields (Hanjalic et al., 2006; Sebe et al., 2007; Divakaran,
                  2009; Ohm, 2016). The specific applications that are particularly interesting to the
                  HCI field include action recognition and motion tracking (Zhu et al., 2006; Vondrak
                  et al., 2012), body tracking (Li et al., 2006), face recognition, facial expression analy-
                  sis (Wu et al., 2006; Wolf et al., 2016), gesture recognition (Argyros and Lourakis,
                  2006), object classification and tracking (Dedeoğlu et al., 2006; Guo et al., 2015),
   327   328   329   330   331   332   333   334   335   336   337