Page 366 -
P. 366
356 CHAPTER 12 Automated data collection methods
to match the formats used by web servers. This mimicry would make your data ame-
nable to analysis by web-log analysis tools. Another possibility is to create parsing
and analysis software: if you can instrument your user interface to collect interaction
information, you will probably find this to be a reasonably manageable task.
More sophisticated questions might require fancier footwork. One common goal
is to study the sequence of events. Do users click Print before Save more frequently
than they click Save before Print? Similar challenges are found when trying to infer
the structure of interaction from web logs, leading to a variety of strategies that have
been used to pick out user “sessions” (Heer and Chi, 2002).
Another approach might be to visualize log files. Highly interactive visualiza-
tions might show each event in a log file as a point on the screen, while providing
tools for filtering and displaying data based on different criteria. As with other ap-
proaches for analyzing log files, visualization has been most widely used for web
logs. WebQuilt (Hong et al., 2001) displays pages and links between them as nodes
and links in a graph. Links are drawn as arrows, with thicker arrows indicating more
heavily used links and shading indicating the amount of time spent on a page before
selection of a link (Figure 12.11A). Users can zoom into a node to directly examine
the page in question (Figure 12.11B).
Other visualizations include the use of two-dimensional “starfield” displays
for viewing individual requests by date, time, and other attributes (Hochheiser and
Shneiderman, 2001) and finer-grained visualizations of mouse events on individual
pages (Arroyo et al., 2006; Atterer et al., 2006).
As with any other analysis, understanding your goals and planning your data ac-
quisition and analysis appropriately is key to effective use of these detailed logs. Ben-
Naim et al. describe the use of log analysis for an adaptive learning program, including
an explicit list of the questions involved and a description of the approaches used to
answer those questions (Ben-Naim et al., 2008). Appropriate storage of log data can
also facilitate analysis, with some researchers using business-oriented online analyti-
cal processing (OLAP) tools to drill down into relevant details (Mavrikis et al., 2015).
Data mining and machine learning techniques can be well suited for the extract-
ing patterns from log files. Relatively simple techniques such as association rules
(Agrawal et al., 1993) might be used to determine patterns of frequently cooc-
curring accesses—for example, “sitemap” and “search” page accesses might fre-
quently be associated with clicks on a “contact us” page. Data mining approaches
have been used to inform site design and usage characterization from a variety of
perspectives, including personalization of content (Srivastava et al., 2000; Eirinaki
and Vazirgiannis, 2003) with results familiar to any web users who have seen web
pages including advertisements matching search terms that they have recently used.
Clustering techniques might also be used to develop models useful for clustering us-
ers into groups based on their usage patterns or predicting desirable outcomes such as
purchases. Although fascinating, and also relevant to the processing of physiological
data (Chapter 13) and ubiquitous computing data (Chapter 14), data mining is largely
beyond the scope of this book—for more information, see one of the many online
courses or textbooks on machine learning and data mining.