Page 131 - Intermediate Statistics for Dummies
P. 131

11_045206 ch06.qxd  2/1/07  9:52 AM  Page 110
                               110
                                         Part II: Making Predictions by Using Regression
                                                    Examining scatterplots and correlations
                                                    After you’ve identified a set of possible x variables, the next step is to find
                                                    out which of these variables are highly related to y in order to start trimming
                                                    down the set of possible candidates for the final model. In the punt distance
                                                    example, the goal is to see which of the six variables in Table 6-1 are strongly
                                                    related to punt distance. The two ways to look at these relationships are the
                                                    following:
                                                       Scatterplots: A graphical technique
                                                       Correlation: A one-number measure of the linear relationship between
                                                        two variables
                                                    Both of these elements are important, and I discuss each of them in the fol-
                                                    lowing sections.
                                                    Seeing relationships through scatterplots
                                                    To begin examining the relationships between the x variables and y, you use
                                                    a series of scatterplots. Figure 6-1 shows all the scatterplots, not only of each
                                                    x variable with y, but each x variable with itself. The scatterplots are in the
                                                    form of a matrix, which is a table made of rows and columns. For example,
                                                    the first scatterplot in row two of Figure 6-1 looks at the variables of distance
                                                    (which appears in column one) and hang time (which appears in row two).
                                                    This scatterplot shows a possible positive (uphill) linear relationship
                                                    between distance and hang time.
                                                              Matrix Plot of Distance, Hang, R_Strength, L_Strength . . .
                                                                3   4   5        120  150 180    80 90 100
                                                      200
                                                      150  Distance
                                                      100                                                         5
                                                                   Hang                                           4
                                                      180                                                         3
                                                      150
                                                                         R_Strength
                                                      120
                                           Figure 6-1:                                                            180
                                            A matrix                              L_Strength                      150
                                               of all  110                                                        120
                                          scatterplots  100
                                                                                          R_Flexibility
                                            between   90
                                             pairs of                                                             100
                                                                                                  L_Flexibility   90
                                          variables in
                                                                                                                  80
                                          the punting  250
                                                      200                                                 O_Strength
                                            distance
                                                      150
                                            example.
                                                       100  150  200     120 150 180      90  100 110     150 200 250
   126   127   128   129   130   131   132   133   134   135   136