Page 365 -
P. 365
12.6 Data management and analysis 355
database to store application data, additional user activity information can often be
added without much effort. This is often the case for database-driven web applica-
tions. If, however, your tool does not interact with a database, developing tools to
parse log files might be easier than adding a database to the application.
12.6.2 ANALYZING LOG FILES
Having collected some log files, you will want to do something with them. Although
log files for web servers, proxies, keystroke trackers, and custom-instrumented soft-
ware might all have different formats and contents, the general approach toward in-
strumentation is roughly the same: in each case, you have one line in the file for each
event of interest. Each line is likely to have some text indicating the time and date of
the event (otherwise known as the timestamp), a description of what happened (such
as the URL that was requested), and other related details.
How you proceed in your analysis is largely determined by your goals. If you are
simply interested in trying to count certain events—for example, how many people
pressed the Print button—you might be able to read through the file, classifying each
event into one or more counters of various types. A single event in a log file might be
classified according to the page that was requested, the day of the week, the time of
day, and the type of web browser that made the request.
Reading through the file to extract the various pieces of information known about
each event is an example of a common computing practice known as parsing. Often
written in scripting languages, such as Perl and Python, log-file-parsing programs
read one line at a time, breaking the entry for each event into constituent pieces and
then updating data structures that keep counts and statistics of different types of
event, as needed. Once the parser has read all of the relevant events and tallied up the
numbers, results can be displayed graphically or in tabular form.
Countless programs for parsing and analyzing web log data have been developed
since the web first came onto the scene in the 1990s. These tools range from freely
available, open-source (but still highly functional) offerings to high-end commercial
products, providing a variety of ways to slice-and-dice data. Many of these tools
work on data from proxy servers as well.
For publicly available websites, many operators rely on the detailed querying and
visualization tools provided by Google Analytics. Using a small bit of code inserted
to every page on the site, Google Analytics collects data and sends it to Google,
where it is stored for analysis via Google’s tools. Google Analytics is a popular and
powerful tool for understanding website usage patterns, but as it is not intended for
supporting usability studies, you might want to try a test run before using it for a full
study. Furthermore, as Analytics only works on public sites, it is not appropriate for
studies using locally hosted material.
Data from nonweb applications might prove a bit more challenging to analyze.
Keystroke loggers and activity loggers may come with their own log-parsing and
analysis packages, but you are likely to be on your own if you write software instru-
mentation to collect data. One approach in this case might be to design your log files