Page 365 -
P. 365

12.6  Data management and analysis    355




                    database to store application data, additional user activity information can often be
                  added without much effort. This is often the case for database-driven web applica-
                  tions. If, however, your tool does not interact with a database, developing tools to
                  parse log files might be easier than adding a database to the application.


                  12.6.2   ANALYZING LOG FILES
                  Having collected some log files, you will want to do something with them. Although
                  log files for web servers, proxies, keystroke trackers, and custom-instrumented soft-
                  ware might all have different formats and contents, the general approach toward in-
                  strumentation is roughly the same: in each case, you have one line in the file for each
                  event of interest. Each line is likely to have some text indicating the time and date of
                  the event (otherwise known as the timestamp), a description of what happened (such
                  as the URL that was requested), and other related details.
                     How you proceed in your analysis is largely determined by your goals. If you are
                  simply interested in trying to count certain events—for example, how many people
                  pressed the Print button—you might be able to read through the file, classifying each
                  event into one or more counters of various types. A single event in a log file might be
                  classified according to the page that was requested, the day of the week, the time of
                  day, and the type of web browser that made the request.
                     Reading through the file to extract the various pieces of information known about
                  each event is an example of a common computing practice known as parsing. Often
                  written in scripting languages, such as Perl and Python, log-file-parsing programs
                  read one line at a time, breaking the entry for each event into constituent pieces and
                  then updating data structures that keep counts and statistics of different types of
                  event, as needed. Once the parser has read all of the relevant events and tallied up the
                  numbers, results can be displayed graphically or in tabular form.
                     Countless programs for parsing and analyzing web log data have been developed
                  since the web first came onto the scene in the 1990s. These tools range from freely
                  available, open-source (but still highly functional) offerings to high-end commercial
                  products, providing a variety of ways to slice-and-dice data. Many of these tools
                  work on data from proxy servers as well.
                     For publicly available websites, many operators rely on the detailed querying and
                  visualization tools provided by Google Analytics. Using a small bit of code inserted
                  to every page on the site, Google Analytics collects data and sends it to Google,
                  where it is stored for analysis via Google’s tools. Google Analytics is a popular and
                  powerful tool for understanding website usage patterns, but as it is not intended for
                  supporting usability studies, you might want to try a test run before using it for a full
                  study. Furthermore, as Analytics only works on public sites, it is not appropriate for
                  studies using locally hosted material.
                     Data from nonweb applications might prove a bit more challenging to analyze.
                  Keystroke loggers and activity loggers may come with their own log-parsing and
                  analysis packages, but you are likely to be on your own if you write software instru-
                  mentation to collect data. One approach in this case might be to design your log files
   360   361   362   363   364   365   366   367   368   369   370