Page 141 -
P. 141
5.10 Pilot testing the survey tool 127
expert evaluations before involving any representative users. You then ask a few
potential respondents about the clarity and motivation of the questions in the sur-
vey. Finally, you do a pilot study where potential respondents complete an entire
survey and the researchers can note any flaws. While this three-stage process is
ideal, in reality, most research in HCI involves either a few colleagues examining
the survey tool or a few potential respondents reading over the survey tool and giv-
ing some feedback, but even at this minimal level, the pilot study is still necessary
and important.
A pilot study can help the researcher identify questions that are confusing or mis-
leading. These pilot study efforts are aimed at determining the validity of the survey,
that is, does the survey measure what it is claiming to measure? (Babbie, 1990; Ozok,
2007). There are usually a few common problems discovered in a pilot study, to
keep an eye out for. For instance, questions that were not answered, questions where
multiple answers were given (when only one was expected); and questions where
respondents filled out “other” (Babbie, 1990). All of these are signs that a question
might need to be reworded. A pilot study, ideally, will involve a small number of
potential respondents (people who meet the inclusion criteria) answering the survey
questions, with encouragement to provide specific feedback on the questions in the
survey. For a small survey study (say, where the goal is 200–300 responses), perhaps
5–10 people taking part in the pilot study would be sufficient. However, for larger
survey studies, where the goal is 100,000 survey responses, a corresponding larger
number of individuals should take part in the pilot study. It is important to note that
individuals who responded to the pilot study should generally not take part in the
main study and their data should not be included. The process of participating in the
pilot study could bias the future responses and therefore, they should not be included
in the main data collection.
A different type of evaluation can take place at a later time. When a survey instru-
ment has been used to collect data multiple times, then the reliability of that survey
can be established. Reliability is the determination of whether a survey measures con-
structs consistently across time (Babbie, 1990; Ozok, 2007). Methods for measuring
the internal reliability of questions, such as having the same question asked multiple
times in a different way, can be used. The Cronbach's alpha coefficient is often used
in that situation (Ozok, 2007).
Another approach to evaluating survey questions after data is collected from
many people, especially if the survey has a large number of questions, is exploratory
factor analysis. In factor analysis, statistical software creates an artificial dimen-
sion that would correlate highly with a set of chosen survey question data (Babbie,
1990). Researchers then determine how important the specific survey question is,
based on the factor loading, which is the correlation level between the data item
and the artificial dimension. Survey items with high factor loadings have high cor-
relation, and are likely to be more predictive, and therefore, more relevant (Babbie,
1990). Exploratory factor analysis can help to cut down the number of questions in
a survey (Ozok and Salvendy, 2001). For instance, in one of the two research proj-
ects described in the Flickr sidebar, the survey questions were validated using an