Page 351 -
P. 351
12.3 Activity-logging software 341
a group of users, retrieves the requested materials, and returns them to the users.
As all requests from a group of users are handled by the proxy server, it can collect
complete session data for all users. This provides a broader picture of user activities
than standard server logs, which only contain records for requests from a single site.
Web proxies can intercept (and modify) user requests before sending them on to
the server. Proxies can also modify the responses from the remote servers before the
resulting web pages are displayed by the client software. Specifically, pages can be
modified to include content necessary for the collection of additional interaction data
(Atterer et al., 2006).
The first step in using a web proxy—for any purpose, including HCI research—is
selecting an appropriate computing environment. As the computational demands of
handling web requests for a large group of users can be substantial, you probably
want to dedicate resources (computers, disk space, and network bandwidth) specifi-
cally for this purpose. If your proxy server is not able to process web requests quickly
and efficiently, users will notice delays in their web browsing. This may cause some
users to change their browsing habits, while others may simply refuse to use the
proxy server. Ideally, the proxy server should not impose any performance penalties
on end users.
Many open-source shareware, and commercial proxy servers are available for all
major computing platforms. The Squid proxy server (http://www.squid-cache.org)
is widely used on Linux and Unix systems. The popular Apache web server (http://
httpd.apache.org) can also be configured to act as a proxy server. The choice of plat-
form and software is likely to be dictated by your specific computing needs.
Once installed, proxy software must be appropriately configured and secured.
You need to consider who may use your proxy server—you can limit access to users
only from certain Internet domains or numbers—which sites you will allow access
to, and what sorts of information you might want to store in the logs. As configura-
tion options differ widely from one proxy package to the next, you should carefully
study your software documentation and related resources.
Web browsers must be configured to use general-purpose web proxies. The con-
figuration process tells the browser to contact the appropriate proxy host for all web
requests. The most straightforward approach is to specify the proxy server settings
directly in a web browser configuration dialog (Figure 12.7), but this requires man-
ual configuration of every browser. Alternatives include proxies at the level of the
Internet gateway or router—many organizations and companies use proxies or simi-
lar intermediate processors to filter web content, for purposes such as blocking adult
content. This approach might be possible in some organizations, but would likely
require working with your IT support teams.
Once the proxy server and web browser have been configured, users can continue
to browse the web as before. Web requests are handled transparently by the proxy
server and noted in the log files (Figure 12.8).
The resulting log files contain information on all sites visited by all users of the
proxy. This is a major difference between proxy servers and web logs (Section 12.2.1).
Whereas web logs maintain access requests for a single site, proxy servers track all