Page 351 -
P. 351

12.3  Activity-logging software  341




                  a group of users, retrieves the requested materials, and returns them to the users.
                  As all requests from a group of users are handled by the proxy server, it can collect
                  complete session data for all users. This provides a broader picture of user activities
                  than standard server logs, which only contain records for requests from a single site.
                     Web proxies can intercept (and modify) user requests before sending them on to
                  the server. Proxies can also modify the responses from the remote servers before the
                  resulting web pages are displayed by the client software. Specifically, pages can be
                  modified to include content necessary for the collection of additional interaction data
                  (Atterer et al., 2006).
                     The first step in using a web proxy—for any purpose, including HCI research—is
                  selecting an appropriate computing environment. As the computational demands of
                  handling web requests for a large group of users can be substantial, you probably
                  want to dedicate resources (computers, disk space, and network bandwidth) specifi-
                  cally for this purpose. If your proxy server is not able to process web requests quickly
                  and efficiently, users will notice delays in their web browsing. This may cause some
                  users to change their browsing habits, while others may simply refuse to use the
                  proxy server. Ideally, the proxy server should not impose any performance penalties
                  on end users.
                     Many open-source shareware, and commercial proxy servers are available for all
                  major computing platforms. The Squid proxy server (http://www.squid-cache.org)
                  is widely used on Linux and Unix systems. The popular Apache web server (http://
                  httpd.apache.org) can also be configured to act as a proxy server. The choice of plat-
                  form and software is likely to be dictated by your specific computing needs.
                     Once installed, proxy software must be appropriately configured and secured.
                  You need to consider who may use your proxy server—you can limit access to users
                  only from certain Internet domains or numbers—which sites you will allow access
                  to, and what sorts of information you might want to store in the logs. As configura-
                  tion options differ widely from one proxy package to the next, you should carefully
                  study your software documentation and related resources.
                     Web browsers must be configured to use general-purpose web proxies. The con-
                  figuration process tells the browser to contact the appropriate proxy host for all web
                  requests. The most straightforward approach is to specify the proxy server settings
                  directly in a web browser configuration dialog (Figure 12.7), but this requires man-
                  ual configuration of every browser. Alternatives include proxies at the level of the
                  Internet gateway or router—many organizations and companies use proxies or simi-
                  lar intermediate processors to filter web content, for purposes such as blocking adult
                  content. This approach might be possible in some organizations, but would likely
                  require working with your IT support teams.
                     Once the proxy server and web browser have been configured, users can continue
                  to browse the web as before. Web requests are handled transparently by the proxy
                  server and noted in the log files (Figure 12.8).
                     The resulting log files contain information on all sites visited by all users of the
                  proxy. This is a major difference between proxy servers and web logs (Section 12.2.1).
                  Whereas web logs maintain access requests for a single site, proxy servers track all
   346   347   348   349   350   351   352   353   354   355   356