Page 86 -
P. 86

72     CHAPTER 4  Statistical analysis




                         manually by participants, may contain errors or may be presented in inconsistent
                         formats. If those errors or inconsistencies are not filtered out or fixed, they may con-
                         taminate the entire data set. Second, the original data collected may be too primitive
                         and higher level coding may be necessary to help identify the underlying themes.
                         Third, the specific statistical analysis method or software may require the data to be
                         organized in a predefined layout or format so that they can be processed (Delwiche
                         and Slaughter, 2008).

                         4.1.1   CLEANING UP DATA
                         The first thing that you need to do after data collection is to screen the data for pos-
                         sible errors. This step is necessary for any type of data collected, but is particularly
                         important for data entered manually by participants. To err is human. All people
                         make mistakes (Norman, 1988). Although it is not possible to identify all the errors,
                         you want to trace as many errors as possible to minimize the negative impact of hu-
                         man errors. There are various ways to identify errors depending on the nature of the
                         data collected.
                            Sometimes you can identify errors by conducting a reasonableness check. For
                         instance, if the age of a participant is entered as “223,” you can easily conclude that
                         there is something wrong. Your participant might have accidentally pushed the num-
                         ber “2” button twice, in which case the correct age should be 23, or he might have
                         accidentally hit the number “3” button after the correct age, 22, has been entered.
                         Sometimes you need to check multiple data fields in order to identify possible er-
                         rors. For example, you may compare the participant's “age” and “years of computing
                           experience” to check whether there is an unreasonable entry.
                            For automatically collected data, error checking usually boils down to time con-
                         sistency issues or whether the performance is within a reasonable range. Something
                         is obviously wrong if the logged start time of an event is later than the logged end
                         time of the same event. You should also be on alert if any unreasonably high or low
                         performance levels are documented.
                            In  many  studies,  data  about  the  same  participant  are  collected  from  multiple
                         channels. For example, in a study investigating multiple data-entry techniques, the
                         performance data (such as time and number of keystrokes) might be automatically
                         logged by data-logging software. The participants' subjective preference and sat-
                         isfaction data might be manually collected via paper-based questionnaires. In this
                         case, you need to make sure that all the data about the same participant are correctly
                         grouped together. The result will be invalid if the performance data of one participant
                         is grouped with the subjective data of another participant.
                            After errors are identified, how shall we deal with them? It is obvious that you
                         always want to fix errors and replace them with accurate data. This is possible
                         in some cases. If the age of a participant is incorrect, you can contact that par-
                         ticipant and find out the accurate information. In many cases, fixing errors in the
                         preprocessing stage is impossible. In many online studies or studies in which the
                         participant remains anonymous, you may have no means of reaching  participants
   81   82   83   84   85   86   87   88   89   90   91