Page 307 -

P. 307

HAN
2011/6/1
13-ch06-243-278-9780123814791
270 Chapter 6 Mining Frequent Patterns, Associations, and Correlations 3:20 Page 270 #28

Similarly, in D 3 , the four new measures correctly show that m and c are strongly
negatively associated because the m to c ratio equals the mc to m ratio, that is,
2
100/1100 = 9.1%. However, lift and χ both contradict this in an incorrect way: Their
values for D 2 are between those for D 1 and D 3 .
2
For data set D 4 , both lift and χ indicate a highly positive association between
m and c, whereas the others indicate a “neutral” association because the ratio of mc to
mc equals the ratio of mc to mc, which is 1. This means that if a customer buys
coffee (or milk), the probability that he or she will also purchase milk (or coffee) is
exactly 50%.

2
“Why are lift and χ so poor at distinguishing pattern association relationships in
the previous transactional data sets?” To answer this, we have to consider the null-
transactions. A null-transaction is a transaction that does not contain any of the item-
sets being examined. In our example, mc represents the number of null-transactions.
2
Lift and χ have difﬁculty distinguishing interesting pattern association relationships
because they are both strongly inﬂuenced by mc. Typically, the number of null-
transactions can outweigh the number of individual purchases because, for example,
many people may buy neither milk nor coffee. On the other hand, the other four
measures are good indicators of interesting pattern associations because their deﬁ-
nitions remove the inﬂuence of mc (i.e., they are not inﬂuenced by the number of
null-transactions).
This discussion shows that it is highly desirable to have a measure that has a value
that is independent of the number of null-transactions. A measure is null-invariant if
its value is free from the inﬂuence of null-transactions. Null-invariance is an impor-
tant property for measuring association patterns in large transaction databases. Among
2
the six discussed measures in this subsection, only lift and χ are not null-invariant
measures.
“Among the all conﬁdence, max conﬁdence, Kulczynski, and cosine measures, which
is best at indicating interesting pattern relationships?”
To answer this question, we introduce the imbalance ratio (IR), which assesses the
imbalance of two itemsets, A and B, in rule implications. It is deﬁned as

|sup(A) − sup(B)|
IR(A,B) = , (6.13)
sup(A) + sup(B) − sup(A ∪ B)
where the numerator is the absolute value of the difference between the support of the
itemsets A and B, and the denominator is the number of transactions containing A or
B. If the two directional implications between A and B are the same, then IR(A,B) will
be zero. Otherwise, the larger the difference between the two, the larger the imbalance
ratio. This ratio is independent of the number of null-transactions and independent of
the total number of transactions.
Let’s continue examining the remaining data sets in Example 6.10.

Example 6.11 Comparing null-invariant measures in pattern evaluation. Although the four mea-
sures introduced in this section are null-invariant, they may present dramatically

302 303 304 305 306 307 308 309 310 311 312