Page 76 - Statistics II for Dummies
P. 76

60       Part II: Using Different Types of Regression to Make Predictions



                                relationship exists between x and y. A correlation without a scatterplot is dan-
                                gerous, too, because the relationship between x and y may be very strong but
                                just not linear.


                      Building a Simple Linear

                      Regression Model



                                After you have a handle on which x variables may be related to y in a linear
                                way, you go about the business of finding that straight line that best fits the
                                data. You find the slope and y-intercept, put them together to make a line,
                                and you use the equation of that line to make predictions for y. All this is part
                                of building a simple linear regression model.

                                In this section, you set the foundation for regression models in general
                                (including those you can find in Chapters 5 through 8). You plot the data,
                                come up with a model that you think makes sense, assess how well it fits, and
                                use it to guesstimate the value of y given another variable, x.


                                Finding the best-fitting

                                line to model your data

                                After you’ve established that x and y have a strong linear relationship, as
                                evidenced by both the scatterplot and the correlation coefficient (close to
                                or beyond 0.7 and –0.7; see the previous sections), you’re ready to build a
                                model that estimates y using x. In the textbook-weight case, you want to
                                estimate average textbook weight using average student weight.

                                The most basic of all the regression models in the simple linear regression
                                model that comes in the general form of y = α + βx + ε. Here, α represents the
                                y-intercept of the line, β represents the slope, and ε represents the error in
                                the model due to chance.

                                A straight line that’s used in simple linear regression is just one of an entire
                                family of models (or functions) that statisticians use to express relationships
                                between variables. A model is just a general name for a function that you can
                                use to describe what outcome will occur based on some given information
                                about one or more related variables.

                                Note that you will never know the true model that describes the relationship
                                perfectly. The best you can do is estimate it based on data.

                                To find the right model for your data, the idea is to scour all possible lines
                                and choose the one that fits the data best. Thankfully, you have an algorithm







          09_466469-ch04.indd   60                                                                   7/24/09   10:20:36 AM
   71   72   73   74   75   76   77   78   79   80   81