Page 341 -
P. 341

12.2  Existing tools  331




                     Web logs have also proven to be a potent tool in HCI research. Given a website
                  and a log file, researchers can often analyze entries to determine where users went
                  and when. When combined with an understanding of the architecture of a site, this
                  information can be used to assess the usability of a site. Timing data in web logs also
                  presents opportunities for empirical studies. Although log data is not perfect, and of-
                  ten presents analytic challenges, appropriate analysis can often yield useful insights.

                  12.2.1.1   Web log contents
                  Although web servers can be configured to store a variety of data fields along with
                  each request, most log files store data that can identify a request and its source. Some
                  log files also contain fields that are generally less useful. The useful data includes:

                  •  Host: The Internet protocol address of the remote computer that made the
                     request. As many people access the Internet via networks that use firewalls or
                     proxy hosts that forward requests from internal machines, a host address might
                     not correspond directly to a specific user's computer.
                  •  Timestamp: When the request occurred, usually including a date and a time
                     code. Times may be given relative to Greenwich Mean Time.
                  •  Request: The HTTP request sent by the client to the server. The request has
                     several fields that may be of interest:
                     •  HTTP Method: The type of request being made—usually “GET” or “POST”
                       (Fielding and Reschke, 2014).
                     •  Resource: The file, script, or other resource requested from the server.
                     •  Protocol: The version of the HTTP protocol used.
                  •  Status Code: A numeric response from the server, indicating success (200–299),
                     redirection (300–399), client error (400–499), or server error (500–599)
                     (Fielding and Reschke, 2014).
                     Several other potentially useful fields may be available:
                  •  Size: The size—in number of bytes—of the item returned to the client.
                  •  Referrer: The web page that “referred” the client to the requested resource. If a
                     user on http://yourhost/index.html clicks on the “search.html” link, the request
                     indicates that “http://yourhost/index” was the referrer. Some requests, such as
                     those that come via an address typed in to a browser, do not arrive via a link and
                     have a dash (“-”) in the referrer field.
                  •  User Agent: The make and model of the web browser that made the request. As
                     this is self-reported, it may or may not be accurate.
                     Figures 12.2 and 12.3 give some example log entries.
                     Most web servers use the common log format (World Wide Web Consortium,
                  1995) or similar formats as the basis for formatting log files. Customization facilities
                  provided by most web services allow for the inclusion of specific fields. This can be
                  very useful for adapting your logs to fit the needs of each project. If you are run-
                  ning a study involving users who are particularly sensitive to privacy concerns, you
                  might configure your server to remove the client IP number from the log files. Similar
   336   337   338   339   340   341   342   343   344   345   346