Page 95 - Statistics II for Dummies
P. 95

Chapter 4: Getting in Line with Simple Linear Regression  79


                      Knowing the Limitations of

                      Your Regression Analysis


                                The bottom line of any data analysis is to make the correct conclusions given
                                your results. When you’re working with a simple linear regression model,
                                there’s the potential to make three major errors. This section shows you
                                those errors and tells you how to avoid them.



                                Avoiding slipping into cause-and-effect mode

                                In a simple linear regression, you investigate whether x is related to y, and if
                                you get a strong correlation and a scatterplot that shows a linear trend, then
                                you find the best-fitting line and use it to estimate the value of y for reasonable
                                values of x.

                                There’s a fine line, however (no pun intended), that you don’t want to cross
                                with your interpretation of regression results. Be careful to not automatically
                                interpret slope in a cause-and-effect mode when you’re using the regression
                                line to estimate the value of y using x. Doing so can result in a leap of faith that
                                can send you into the frying pan. Unless you have used a controlled experi-
                                ment to get the data, you can only assume that the variables are correlated;
                                you can’t really give a stone-cold guarantee about why they’re related.
                                In the textbook-weight example, you estimate the average weight of the
                                students’ textbooks by using the students’ average weight, but that doesn’t
                                mean increasing a particular child’s weight causes his textbook weight to
                                increase. For example, because of the strong positive correlation, you do
                                know that students with lower weights are associated with lower total
                                textbook weights, and students with higher weights tend to have higher
                                textbook weights. But you can’t take one particular third-grade student,
                                increase his weight, and presto — suddenly his textbooks weigh more.

                                The variable underlying the relationship between a child’s weight and the
                                weight of his backpack is the grade level of the student from an academic
                                standpoint; as grade level increases, so might the size and number of his books,
                                as well as the homework coming home. Student grade level drives both student
                                weight and textbook weight. In this situation, student grade level is what
                                statisticians call a lurking variable; it’s a variable that wasn’t included in the
                                model but is related to both the outcome and the response. A lurking vari-
                                able confuses the issue of what’s causing what to happen.














          09_466469-ch04.indd   79                                                                   7/24/09   10:20:40 AM
   90   91   92   93   94   95   96   97   98   99   100