Page 44 - Statistics II for Dummies
P. 44
28 Part I: Tackling Data Analysis and Model-Building Basics
There is no rule of thumb regarding how large or small the margin of error
should be for a quantitative variable; it depends on what the variable is
counting or measuring. For example, if you want average household income
for the state of New York, a margin of error of plus or minus $5,000 is not
unreasonable. If the variable is the average number of steps from the first
floor to the second floor of a two-story home in the U.S., the margin of error
will be much smaller. Estimates of categorical variables, on the other hand,
are percentages; most people want those confidence intervals to be within
plus or minus 2 to 3 percent.
Making comparisons
Suppose you want to look at income (a quantitative variable) and how it
relates to a categorical variable, such as gender or region of the country.
Your first question may be: Do males still make more money than females?
In this case, you can compare the mean incomes of two populations — males
and females. This assessment requires a hypothesis test of two means (often
called a t-test for independent samples). I present more information on this
technique in Chapter 3.
When comparing the means of more than two groups, don’t simply look at all
the possible t-tests that you can do on the pairs of means because you have to
control for an overall error rate in your analysis. Too many analyses can result
in errors — adding up to disaster. For example, if you conduct 100 hypothesis
tests, each one with a 5 percent error rate, then 5 of those 100 tests will come
out statistically significant on average, just by chance, even if no real relation-
ship exists.
If you want to compare the average wage in different regions of the country
(the East, the Midwest, the South, and the West, for example), this compari-
son requires a more sophisticated analysis because you’re looking at four
groups rather than just two. The procedure for comparing more than two
means is called analysis of variance (ANOVA, for short), and I discuss this
method in detail in Chapters 9 and 10.
Exploring relationships
One of the most common reasons data is collected is to look for relationships
between variables. With quantitative variables, the most common type of
relationship people look for is a linear relationship; that is, as one variable
increases, does the other increase/decrease along with it in a similar way?
Relationships between any variables are examined using specialized plots
and statistics. Since a linear relationship is so common, it has its own special
statistic called correlation. You find out how statisticians make graphs and
statistics to explore relationships in this section, paying particular attention
to linear relationships.
06_466469-ch02.indd 28 7/24/09 9:31:38 AM