Page 303 -
P. 303
13-ch06-243-278-9780123814791
2011/6/1
HAN
266 Chapter 6 Mining Frequent Patterns, Associations, and Correlations 3:20 Page 266 #24
Lift is a simple correlation measure that is given as follows. The occurrence of itemset
A is independent of the occurrence of itemset B if P(A ∪ B) = P(A)P(B); otherwise,
itemsets A and B are dependent and correlated as events. This definition can easily be
extended to more than two itemsets. The lift between the occurrence of A and B can be
measured by computing
P(A ∪ B)
lift(A, B) = . (6.8)
P(A)P(B)
If the resulting value of Eq. (6.8) is less than 1, then the occurrence of A is negatively
correlated with the occurrence of B, meaning that the occurrence of one likely leads to
the absence of the other one. If the resulting value is greater than 1, then A and B are
positively correlated, meaning that the occurrence of one implies the occurrence of the
other. If the resulting value is equal to 1, then A and B are independent and there is no
correlation between them.
Equation (6.8) is equivalent to P(B|A)/P(B), or conf(A ⇒ B)/sup(B), which is also
referred to as the lift of the association (or correlation) rule A ⇒ B. In other words, it
assesses the degree to which the occurrence of one “lifts” the occurrence of the other. For
example, if A corresponds to the sale of computer games and B corresponds to the sale
of videos, then given the current market conditions, the sale of games is said to increase
or “lift” the likelihood of the sale of videos by a factor of the value returned by Eq. (6.8).
Let’s go back to the computer game and video data of Example 6.7.
Example 6.8 Correlation analysis using lift. To help filter out misleading “strong” associations of
the form A ⇒ B from the data of Example 6.7, we need to study how the two item-
sets, A and B, are correlated. Let game refer to the transactions of Example 6.7 that do
not contain computer games, and video refer to those that do not contain videos. The
transactions can be summarized in a contingency table, as shown in Table 6.6.
From the table, we can see that the probability of purchasing a computer game
is P({game}) = 0.60, the probability of purchasing a video is P({video}) = 0.75, and
the probability of purchasing both is P({game,video}) = 0.40. By Eq. (6.8), the lift of
Rule (6.6) is P({game, video})/(P({game}) × P({video})) = 0.40/(0.60 × 0.75) = 0.89.
Because this value is less than 1, there is a negative correlation between the occur-
rence of {game} and {video}. The numerator is the likelihood of a customer purchasing
both, while the denominator is what the likelihood would have been if the two pur-
chases were completely independent. Such a negative correlation cannot be identified
by a support–confidence framework.
2
The second correlation measure that we study is the χ measure, which was intro-
2
duced in Chapter 3 (Eq. 3.1). To compute the χ value, we take the squared difference
between the observed and expected value for a slot (A and B pair) in the contin-
gency table, divided by the expected value. This amount is summed for all slots of the
2
contingency table. Let’s perform a χ analysis of Example 6.8.