Page 429 -
P. 429
420 CHAPTER 14 Online and ubiquitous HCI research
GOOGLE FLU
The history of Google's flu trend analysis tools (https://www.google.org/
flutrends/about/) illustrates some of the potential value—and some of the
pitfalls—in examining search data. Google's team analyzed a large corpus of
search queries combined with geographical information identifying the location
from which each query was issued. Noting a strong correlation between flu-
related queries and clinicians' visits potentially related to flu, they were able
to accurately predict which regions in the United States were experiencing flu
outbreaks (Ginsberg et al., 2009). The excitement generated by these results was
soon tempered by further experience demonstrating the trickiness of relating
web search activity to online reality. A 2011 investigation of the performance
of Google Flu Trends during the 2009 H1N1 influenza pandemic found that
search behavior changed during the pandemic, as users searched for terms
for influenza and related complications (Cook et al., 2011), and the estimates
for the 2013 flu season varied radically from those issued by the Centers for
Disease Control (Butler, 2013). A 2014 commentary reviewed related results
and suggested that search data might be most useful when combined with
other existing data sources (Lazer et al., 2014). This commentary also raised
an important concern relevant to other studies of web search trends: as search
engines are based on proprietary algorithms subject to regular revision, results
may not be reliable or replicable (Lazer et al., 2014). Unsurprisingly, the
exploration of twitter data for tracking flu epidemics has also been an area of
active research (Allen et al., 2016; Santillana et al., 2015).
Despite concerns regarding the validity of predictions generated by Google
Flu Trends, search logs continue to be a rich source of data for researchers
interested in studying the implications of health-related terms. Some of this
work attempts to validate Flu Trends, using other relevant indicators, such
as flu-related visits to emergency departments (Klembczyk et al., 2016) as
comparison points. A South Korean effort used social media (Twitter and blog)
efforts to identify potential starting points in a subsequent examination of search
terms for flu-related concepts (Woo et al., 2016), providing an example of the
utility of combining multiple sources of online behavior data. Other efforts
include flu tracking using only Twitter data (Allen et al., 2016; Santillana
et al., 2015), and the use of search logs to identify possible adverse interactions
between two drugs (White et al., 2013), to study the increasing severity of
concern when searching for medical content (known as “Cyberchondria”)
(White and Horvitz, 2009), or to identify symptoms that might be early
indicators of cancers (Paparrizos et al., 2016). Related studies have used search
data to explore biases in the search for health-related information (White, 2013;
White and Hassan, 2014).