Page 87 -
P. 87

4.1  Preparing data for statistical analysis  73




                  after the data is collected. Under those circumstances, you need to remove the
                  problematic data items and treat them as missing values in the statistical data
                  analysis.
                     Sometimes, the data collected need to be cleaned up due to inappropriate format-
                  ting. Using age as an example, participants may enter age in various formats. In an
                  online survey, most respondents used numeric values such as “9” to report their age
                  (Feng et al., 2008). Some used text such as “nine” or “nine and a half.” A number
                  of participants even entered detailed text descriptions such as “He will turn nine in
                  January.” The entries in text formats were all transformed to numeric values before
                  the data was analyzed by statistical software.


                  4.1.2   CODING DATA
                  In many studies, the original data collected need to be coded before any statistical
                  analysis can be conducted. A typical example is the data about the demographic
                  information of your participants.  Table  4.1 shows the original demographic data
                  of three participants. The information on age is numerical and does not need to be
                  coded. The information on gender, highest degree earned, and previous software ex-
                  perience needs to be coded so that statistical software can interpret the input. In
                  Table 4.2, gender information is coded using 1 to represent “male” and 0 to represent
                  “female.” Highest degree earned has more categories, with 1 representing a high
                  school degree, 2 representing a college degree, and 3 representing a graduate degree.
                  Previous software experience is also coded, with 1 representing “Yes” and 0 repre-
                  senting “No.” Usually we use codes “0” and “1” for dichotomous variables (categori-
                  cal variables with exactly two possible values). When coding variables with three or
                  more possible values, the codes used may vary depending on the specific context. For


                   Table 4.1  Sample Demographic Data in Its Original Form
                                                                   Previous Experience
                                  Age    Gender    Highest Degree  In Software A
                   Participant 1  34      Male         College            Yes
                   Participant 2  28     Female       Graduate             No
                   Participant 3  21     Female      High school           No


                   Table 4.2  Sample Demographic Data in Coded Form
                                                                   Previous Experience
                                 Age     Gender    Highest Degree  In Software A
                   Participant 1  34        1            2                 1
                   Participant 2  28        0            3                 0
                   Participant 3  21        0            1                 0
   82   83   84   85   86   87   88   89   90   91   92