Page 55 -
P. 55
HAN 08-ch01-001-038-9780123814791
18 Chapter 1 Introduction 2011/6/1 3:12 Page 18 #18
Typically, association rules are discarded as uninteresting if they do not satisfy both a
minimum support threshold and a minimum confidence threshold. Additional anal-
ysis can be performed to uncover interesting statistical correlations between associated
attribute–value pairs.
Frequent itemset mining is a fundamental form of frequent pattern mining. The min-
ing of frequent patterns, associations, and correlations is discussed in Chapters 6 and 7,
where particular emphasis is placed on efficient algorithms for frequent itemset min-
ing. Sequential pattern mining and structured pattern mining are considered advanced
topics.
1.4.3 Classification and Regression for Predictive Analysis
Classification is the process of finding a model (or function) that describes and distin-
guishes data classes or concepts. The model are derived based on the analysis of a set of
training data (i.e., data objects for which the class labels are known). The model is used
to predict the class label of objects for which the the class label is unknown.
“How is the derived model presented?” The derived model may be represented in var-
ious forms, such as classification rules (i.e., IF-THEN rules), decision trees, mathematical
formulae,orneuralnetworks (Figure1.9).Adecisiontreeisaflowchart-liketreestructure,
where each node denotes a test on an attribute value, each branch represents an outcome
of the test, and tree leaves represent classes or class distributions. Decision trees can easily
age(X, “youth”) AND income(X, “high”) class(X, “A”)
age(X, “youth”) AND income(X, “low”) class(X, “B”)
age(X, “middle_aged”) class(X, “C”)
age(X, “senior”) class(X, “C”)
(a)
age?
f 3 f 6 class A
youth middle_aged, senior
age f 1
f 4 f 7 class B
income? class C
income f 2
high low f 5 f 8 class C
class A class B
(b) (c)
Figure 1.9 A classification model can be represented in various forms: (a) IF-THEN rules, (b) a decision
tree, or (c) a neural network.