Page 430 -
P. 430
14.2 Online research 421
If your data source is either inaccessible due to business concerns, lack of an
open API, or unacceptable costs, you might consider reframing your study to match
what can be accomplished within your means. Substituting smaller scale studies or
qualitative research for broad examinations into usage patterns might be one ap-
proach. One study used a set of interviews with Facebook users to understand how
the content, layout, and functionality of the site influenced communication of health
information (Menefee et al., 2016). Although smaller qualitative studies lack the
broad appeal of the analysis of millions of posts, they might be more economical
to complete.
If you are lucky enough to get your hands on a large dataset relevant to your
interests, you might use a variety of techniques, depending on your interests
and goals. Be prepared to spend some time on data cleaning and extraction,
potentially taking textual representations of tweets, posts, or other data and
formatting them in a normalized pattern suitable for querying or text search-
ing (Baeza-Yates and Riberio-Neto, 2011). Once the data is ready for analysis,
you may use any of a range of techniques. Possibilities include natural-language
processing approaches that try to extract key concepts and relationships from
free text (Hedegaard and Simonsen, 2013), and information retrieval techniques
(Baeza-Yates and Riberio-Neto, 2011) to model similarities between documents
and common concepts and terms. Other approaches have used descriptive sta-
tistics tracking types of activities and relationships (Kittur and Kraut, 2008),
relative frequencies of different types of events (White et al., 2013), and any
number of other techniques as appropriate. For social media analysis, you might
build networks indicating relationships between individuals, topics, and other
items of interest. Graph algorithms might be used to find network members who
are “hubs”—outliers in terms of number of connections or presence on impor-
tant paths (Scott, 2013). The Social Media Research Foundation (http://www.
smrfoundation.org) has developed a tool known as NodeXL, which supports the
development of networks, calculation of centrality measures, and visualization,
all through spreadsheet data (Bonsignore et al., 2009; Hansen and Shneiderman,
2010).
In a refrain that should be familiar to readers who have made it this far, any
of these data sources can be augmented by appropriate analysis with related data
collected through different modalities. Examples include the use of surveys to
understand user practices and beliefs with regard to searches for health informa-
tion (White, 2013) and the use of instrumented web pages (Chapter 12) (Huang
et al., 2012) or eye tracking (Chapter 13) (Huang et al., 2011) to capture fine-
grain data correlated with search engine interactions. Approaches like these also
open search engine interaction research to those who are not directly working
with the relevant companies, as logging toolkits and eye-tracking experiments
might be conducted in usability labs lacking access to large volumes of search
interaction logs.