Page 28 - Statistics II for Dummies
P. 28

12       Part I: Tackling Data Analysis and Model-Building Basics



                                Suppose Bill Prediction (from the previous section) decides to try to pre-
                                dict scores on a biology exam based on study time, but this time his model
                                doesn’t fit. Not one to give in, Bill insists there must be some other factors
                                that predict biology exam scores besides study time, and he sets out to find
                                them.
                                Bill measures everything from soup to nuts. His set of 20 possible variables
                                includes study time, GPA, previous experience in statistics, math grades in
                                high school, and whether you chew gum during the exam. After his multitude
                                of various correlation analyses, the variables that Bill found to be related
                                to exam score were study time, math grades in high school, GPA, and gum
                                chewing during the exam. It turns out that this particular model fits pretty
                                well (by criteria I discuss in Chapter 5 on multiple linear regression models).
                                But here’s the problem: By looking at all possible correlations between his 20
                                variables and exam score, Bill is actually doing 20 separate statistical analy-
                                ses. Under typical conditions that I describe in Chapter 3, each statistical
                                analysis has a 5 percent chance of being wrong just by chance. I bet you can
                                guess which one of Bill’s correlations likely came out wrong in this case. And
                                hopefully he didn’t rely on a stick of gum to boost his grade in biology.

                                Looking at data until you find something in it is called data snooping. Data
                                snooping results in giving the researcher his five minutes of fame but then
                                leads him to lose all credibility because no one can repeat his results.


                                No (data) fishing allowed


                                Some folks just don’t take no for an answer, and when it comes to analyzing
                                data, that can lead to trouble.

                                Sue Gonnafindit is a determined researcher. She believes that her horse can
                                count by stomping his foot. (For example, she says “2” and her horse stomps
                                twice.) Sue collects data on her horse for four weeks, recording the percent-
                                age of time the horse gets the counting right. She runs the appropriate sta-
                                tistical analysis on her data and is shocked to find no significant difference
                                between her horse’s results and those you would get simply by guessing.
                                Determined to prove her results are real, Sue looks for other types of analy-
                                ses that exist and plugs her data into anything and everything she can find
                                (never mind that those analyses are inappropriate to use in her situation).
                                Using the famous hunt-and-peck method, at some point she eventually stum-
                                bles upon a significant result. However, the result is bogus because she tried
                                so many analyses that weren’t appropriate and ignored the results of the
                                appropriate analysis because it didn’t tell her what she wanted to hear.











          05_466469-ch01.indd   12                                                                    7/24/09   9:30:47 AM
   23   24   25   26   27   28   29   30   31   32   33