Page 42 - Statistics II for Dummies
P. 42
26 Part I: Tackling Data Analysis and Model-Building Basics
To make a two-way table from a data set by using Minitab, first enter the data
in two columns, where column one is the row variable (in this case, gender)
and column two is the column variable (in this case, political affiliation). For
example, suppose the first person is a male Democrat. In row one of Minitab,
enter M (for male) in column one and D (Democrat) in column two. Then go to
Stat>Tables>Cross Tabulation and Chi-square. Highlight column one and click
Select to enter this variable in the For Rows line. Highlight column two and
click Select to enter this variable in the For Columns line. Click OK.
People often use the word correlation to discuss relationships between vari-
ables, but in the world of statistics, correlation only relates to the relationship
between two quantitative (numerical) variables, not two categorical variables.
Correlation measures how closely the relationship between two quantitative
variables, such as height and weight, follows a straight line and tells you the
direction of that line as well. In total, for any two quantitative variables, x and
y, the correlation measures the strength and direction of their linear relation-
ship. As one increases, what does the other one do?
Because categorical variables don’t have a numerical order to them, they
don’t increase or decrease in value. For example, just because male = 1 and
female = 2 doesn’t mean that a female is worth twice as much as a male
(although some women may want to disagree). Therefore, you can’t use the
word correlation to describe the relationship between, say, gender and politi-
cal affiliation. (Chapter 4 covers correlation.)
The appropriate term to describe the relationships of categorical variables is
association. You can say that political affiliation is associated with gender and
then explain how. (For full details on association, see Chapter 13.)
Building models to make predictions
You can build models to predict the value of a categorical variable based on
other related information. In this case, building models is more than a lot of
little plastic pieces and some irritatingly sticky glue.
When you build a statistical model, you look for variables that help explain,
estimate, or predict some response you’re interested in; the variables that do
this are called explanatory variables. You sort through the explanatory vari-
ables and figure out which ones do the best job of predicting the response.
Then you put them together into a type of equation like y = 2x + 4 where
x = shoe size and y = estimated calf length. That equation is a model.
For example, suppose you want to know which factors or variables can help
you predict someone’s political affiliation. Is a woman without children more
likely to be a Republican or a Democrat? What about a middle-aged man who
proclaims Hinduism as his religion?
06_466469-ch02.indd 26 7/24/09 9:31:37 AM