Page 151 - Intermediate Statistics for Dummies
P. 151

12_045206 ch07.qxd  2/1/07  9:54 AM  Page 130
                               130
                                         Part II: Making Predictions by Using Regression
                                         Starting Out with Scatterplots
                                                    As with any type of data analysis, before you plunge in and select a model
                                                    that you think fits the data, or that is supposed to fit the data, you have to
                                                    step back and take a look at the data and see whether any patterns emerge.
                                                    To do this, look at a scatterplot of the data, and see whether or not you can
                                                    draw a smooth curve through the data and find that most of the points follow
                                                    along that curve.
                                                    Suppose you’re interested in modeling how quickly a rumor spreads. One
                                                    person knows a secret, tells another person, and now two know the secret;
                                                    each of them tells a person, and now four know the secret; some of those
                                                    people may pass it on, and so it goes on down the line. Pretty soon, a large
                                                    number of people know the secret (which is a secret no longer). To collect your
                                                    data, you count the number of people who know a secret by tracking who tells
                                                    who over a six-day period. You can see a scatterplot of the data in Figure 7-1.
                                                                      Correlation r = 0.906
                                                      Number of people who know the secret 35
                                                       30
                                                       20
                                           Figure 7-1:  25
                                                 A     15
                                           scatterplot
                                          showing the  10
                                          spread of a   5
                                          secret over
                                            a six-day   0
                                                           1       2      3       4       5      6
                                             period.
                                                                             Day
                                                    In this situation, the explanatory variable, x, is day, and the response vari-
                                                    able, y, is the number of people who know the secret. Looking at Figure 7-1,
                                                    you can see a pattern between the values of x and y. But this pattern isn’t
                                                    linear. It curves upwards. If you tried to fit a line to this data set anyway, how
                                                    well would it fit?
                                                    To figure this out, you can look at the correlation coefficient between x and y,
                                                    which is found on Figure 7-1 to be 0.906 (see Chapter 4 for more on correla-
                                                    tion). You can interpret this correlation as a strong, positive (uphill) linear
   146   147   148   149   150   151   152   153   154   155   156