Page 76 - Statistics II for Dummies
P. 76
60 Part II: Using Different Types of Regression to Make Predictions
relationship exists between x and y. A correlation without a scatterplot is dan-
gerous, too, because the relationship between x and y may be very strong but
just not linear.
Building a Simple Linear
Regression Model
After you have a handle on which x variables may be related to y in a linear
way, you go about the business of finding that straight line that best fits the
data. You find the slope and y-intercept, put them together to make a line,
and you use the equation of that line to make predictions for y. All this is part
of building a simple linear regression model.
In this section, you set the foundation for regression models in general
(including those you can find in Chapters 5 through 8). You plot the data,
come up with a model that you think makes sense, assess how well it fits, and
use it to guesstimate the value of y given another variable, x.
Finding the best-fitting
line to model your data
After you’ve established that x and y have a strong linear relationship, as
evidenced by both the scatterplot and the correlation coefficient (close to
or beyond 0.7 and –0.7; see the previous sections), you’re ready to build a
model that estimates y using x. In the textbook-weight case, you want to
estimate average textbook weight using average student weight.
The most basic of all the regression models in the simple linear regression
model that comes in the general form of y = α + βx + ε. Here, α represents the
y-intercept of the line, β represents the slope, and ε represents the error in
the model due to chance.
A straight line that’s used in simple linear regression is just one of an entire
family of models (or functions) that statisticians use to express relationships
between variables. A model is just a general name for a function that you can
use to describe what outcome will occur based on some given information
about one or more related variables.
Note that you will never know the true model that describes the relationship
perfectly. The best you can do is estimate it based on data.
To find the right model for your data, the idea is to scour all possible lines
and choose the one that fits the data best. Thankfully, you have an algorithm
09_466469-ch04.indd 60 7/24/09 10:20:36 AM