Page 152 - Intermediate Statistics for Dummies
P. 152

12_045206 ch07.qxd  2/1/07  9:54 AM  Page 131
                                                  Chapter 7: When Data Throws You a Curve: Using Nonlinear Regression
                                                    relationship between x and y. However in this case, the correlation is mislead-
                                                    ing, because the scatterplot appears to be curved. As with any regression
                                                    analysis, taking into account both the scatterplot and the correlation when
                                                    making a decision about how well the model being considered would fit the
                                                    data is very important. The contradiction in this example between the scat-
                                                    terplot and the correlation is a red flag telling you that a straight-line model
                                                    isn’t the best idea.
                                                    The correlation coefficient measures only the strength and direction of the
                                                    linear relationship between x and y (see Chapter 4). However, you may run
                                                    into situations (like the one shown in Figure 7-1) where a correlation can be
                                                    strong, yet the scatterplot shows a curve would fit better. Don’t rely solely
                                                    on either the scatterplot or the correlation coefficient alone to make your
                                                    decision about whether to go ahead and fit a straight line to your data.
                                                    The bottom line here is that fitting a line to data that appears to have a curved
                                                    pattern isn’t the way to go. What you need to do in this situation is explore
                                                    models that have curved patterns themselves. In the following sections, you  131
                                                    see two major types of nonlinear (or curved) models that are used to model
                                                    curved data: polynomials (beyond a straight line) and exponential models
                                                    (that start out small and quickly increase, or the other way around). Because
                                                    the pattern of the data in Figure 7-1 starts low and bends upward, the correct
                                                    model to fit this data is an exponential regression model. (This model would
                                                    also be appropriate for data that starts out high and bends down low.)
                                         Handling Curves in the Road
                                         with Polynomials
                                                    One major family of nonlinear models is the polynomial family. You use these
                                                    models when a polynomial function (beyond a straight line) best describes
                                                    the curve in the data. (For example, the data may follow the shape of a
                                                    parabola, which is a second-degree polynomial.) You typically use polynomial
                                                    models when the data follow a pattern of curves going up and down a certain
                                                    number of times. For example, suppose a doctor examines the occurrence of
                                                    heart problems in patients as it relates to their blood pressure. She finds that
                                                    patients with very low or very high blood pressure had a higher occurrence
                                                    of problems, while patients whose blood pressure fell in the middle, consti-
                                                    tuting the normal range, had fewer problems. This pattern of data has a
                                                    U-shape, and a parabola would fit this data well.
                                                    In this section, you see what a polynomial regression model is, how you can
                                                    search for a good-fitting polynomial for your data, and how you can assess
                                                    polynomial models.
   147   148   149   150   151   152   153   154   155   156   157