Page 429 -
P. 429

420    CHAPTER 14  Online and ubiquitous HCI research





                           GOOGLE FLU
                           The history of Google's flu trend analysis tools (https://www.google.org/
                           flutrends/about/) illustrates some of the potential value—and some of the
                           pitfalls—in examining search data. Google's team analyzed a large corpus of
                           search queries combined with geographical information identifying the location
                           from which each query was issued. Noting a strong correlation between flu-
                           related queries and clinicians' visits potentially related to flu, they were able
                           to accurately predict which regions in the United States were experiencing flu
                           outbreaks (Ginsberg et al., 2009). The excitement generated by these results was
                           soon tempered by further experience demonstrating the trickiness of relating
                           web search activity to online reality. A 2011 investigation of the performance
                           of Google Flu Trends during the 2009 H1N1 influenza pandemic found that
                           search behavior changed during the pandemic, as users searched for terms
                           for influenza and related complications (Cook et al., 2011), and the estimates
                           for the 2013 flu season varied radically from those issued by the Centers for
                           Disease Control (Butler, 2013). A 2014 commentary reviewed related results
                           and suggested that search data might be most useful when combined with
                           other existing data sources (Lazer et al., 2014). This commentary also raised
                           an important concern relevant to other studies of web search trends: as search
                           engines are based on proprietary algorithms subject to regular revision, results
                           may not be reliable or replicable (Lazer et al., 2014). Unsurprisingly, the
                           exploration of twitter data for tracking flu epidemics has also been an area of
                           active research (Allen et al., 2016; Santillana et al., 2015).
                             Despite concerns regarding the validity of predictions generated by Google
                           Flu Trends, search logs continue to be a rich source of data for researchers
                           interested in studying the implications of health-related terms. Some of this
                           work attempts to validate Flu Trends, using other relevant indicators, such
                           as flu-related visits to emergency departments (Klembczyk et al., 2016) as
                           comparison points. A South Korean effort used social media (Twitter and blog)
                           efforts to identify potential starting points in a subsequent examination of search
                           terms for flu-related concepts (Woo et al., 2016), providing an example of the
                           utility of combining multiple sources of online behavior data. Other efforts
                           include flu tracking using only Twitter data (Allen et al., 2016; Santillana
                           et al., 2015), and the use of search logs to identify possible adverse interactions
                           between two drugs (White et al., 2013), to study the increasing severity of
                           concern when searching for medical content (known as “Cyberchondria”)
                           (White and Horvitz, 2009), or to identify symptoms that might be early
                           indicators of cancers (Paparrizos et al., 2016). Related studies have used search
                           data to explore biases in the search for health-related information (White, 2013;
                           White and Hassan, 2014).
   424   425   426   427   428   429   430   431   432   433   434