Page 424 -
P. 424
14.2 Online research 415
Web log analysis can be particularly useful for comparison of alternative web
site designs or interactions. “A/B” testing is a widely used approach for compar-
ing alternative designs for active web sites. In an A/B test a server is configured to
randomly select one of two alternatives—the “A” and “B” designs to be presented
whenever a visitor comes to a site. Given enough visits, data can be collected to see
which users complete specified tasks more quickly or with fewer errors. Such tests
might also add quick surveys asking users for their impression of a site. By using
functioning web sites to gather data on many users who had come to a site, these
A/B tests enable rapid collection of usability data, without the need to conduct a
formal usability test.
Building on this approach, it is also possible to conduct empirical studies online.
Just as web logs might be used to extract event and therefore task completion times in
web-based studies run on a local server, an appropriately structured site might enable
easy extraction of task completion times, results, etc. You can also create appropriate
components of the web site to collect informed consent (with approval of your IRB,
see Chapter 15), demographic information, and other needed details. Such an instal-
lation has the advantage of allowing participants to enroll in a study without your
participation—they can just go to the URL in question and follow the directions.
Any timing data collected from either A/B testing or online empirical studies runs
the risk of being confounded by network latencies or problems. If a network prob-
lem slows the communication between participants' computers and your servers, task
completion times may be slowed, but you would not have any way of knowing that
that had happened. Larger numbers of participants might help with this problem, as
extreme values in latency will be more clearly identified as outliers.
Validity of online versus lab-based studies may be a concern. One study of the
utility of online versus lab-based studies for empirical evaluation of search interfaces
found that online and lab-based studies produced comparable results (Kelly and
Gyllstrom, 2011). To ensure similar generalization to your problems of interest, you
might consider pairing a small in-person study with a larger online study. Similarities
in the results will increase confidence in the online data, but discrepancies might
indicate some difficulties in translation (Meyer and Bederson, 1998).
A/B testing has been used extensively by companies with prominent Internet
business activity, as Amazon, Microsoft, and other familiar web companies are well
aware of the importance in small changes in design and task completion. For sites
serving millions of users, an increase of even 1% on ad views or completed sales
can mean significant increases in revenue. The importance of A/B testing has led to
significant methodological interest, from practical guidance from web usability guru
Jakob Nielsen (Nielsen, 2005, 2012, 2014) to papers on the design of A/B studies
(Kharitonov et al., 2015) and the investigation of novel statistical analysis techniques
(Deng et al., 2013, 2014; Deng, 2015). Ron Kohavi and colleagues at Microsoft have
published extensively in this area, including a survey presenting a broad overview of
the topic (Kohavi et al., 2009) and papers discussing some of the pitfalls and lessons
learned from Microsoft's extensive A/B testing (Crook et al., 2009; Kohavi et al.,
2012, 2013).