Page 341 -
P. 341
12.2 Existing tools 331
Web logs have also proven to be a potent tool in HCI research. Given a website
and a log file, researchers can often analyze entries to determine where users went
and when. When combined with an understanding of the architecture of a site, this
information can be used to assess the usability of a site. Timing data in web logs also
presents opportunities for empirical studies. Although log data is not perfect, and of-
ten presents analytic challenges, appropriate analysis can often yield useful insights.
12.2.1.1 Web log contents
Although web servers can be configured to store a variety of data fields along with
each request, most log files store data that can identify a request and its source. Some
log files also contain fields that are generally less useful. The useful data includes:
• Host: The Internet protocol address of the remote computer that made the
request. As many people access the Internet via networks that use firewalls or
proxy hosts that forward requests from internal machines, a host address might
not correspond directly to a specific user's computer.
• Timestamp: When the request occurred, usually including a date and a time
code. Times may be given relative to Greenwich Mean Time.
• Request: The HTTP request sent by the client to the server. The request has
several fields that may be of interest:
• HTTP Method: The type of request being made—usually “GET” or “POST”
(Fielding and Reschke, 2014).
• Resource: The file, script, or other resource requested from the server.
• Protocol: The version of the HTTP protocol used.
• Status Code: A numeric response from the server, indicating success (200–299),
redirection (300–399), client error (400–499), or server error (500–599)
(Fielding and Reschke, 2014).
Several other potentially useful fields may be available:
• Size: The size—in number of bytes—of the item returned to the client.
• Referrer: The web page that “referred” the client to the requested resource. If a
user on http://yourhost/index.html clicks on the “search.html” link, the request
indicates that “http://yourhost/index” was the referrer. Some requests, such as
those that come via an address typed in to a browser, do not arrive via a link and
have a dash (“-”) in the referrer field.
• User Agent: The make and model of the web browser that made the request. As
this is self-reported, it may or may not be accurate.
Figures 12.2 and 12.3 give some example log entries.
Most web servers use the common log format (World Wide Web Consortium,
1995) or similar formats as the basis for formatting log files. Customization facilities
provided by most web services allow for the inclusion of specific fields. This can be
very useful for adapting your logs to fit the needs of each project. If you are run-
ning a study involving users who are particularly sensitive to privacy concerns, you
might configure your server to remove the client IP number from the log files. Similar