Page 26 -

P. 26

05-pref-xxiii-xxx-9780123814791
3:35
2011/6/1
Page xxv
#3
HAN
Preface xxv

Chapter 3 introduces techniques for data preprocessing. It ﬁrst introduces the con-
cept of data quality and then discusses methods for data cleaning, data integration, data
reduction, data transformation, and data discretization.
Chapters 4 and 5 provide a solid introduction to data warehouses, OLAP (online ana-
lytical processing), and data cube technology. Chapter 4 introduces the basic concepts,
modeling, design architectures, and general implementations of data warehouses and
OLAP, as well as the relationship between data warehousing and other data generali-
zation methods. Chapter 5 takes an in-depth look at data cube technology, presenting a
detailed study of methods of data cube computation, including Star-Cubing and high-
dimensional OLAP methods. Further explorations of data cube and OLAP technologies
are discussed, such as sampling cubes, ranking cubes, prediction cubes, multifeature
cubes for complex analysis queries, and discovery-driven cube exploration.
Chapters 6 and 7 present methods for mining frequent patterns, associations, and
correlations in large data sets. Chapter 6 introduces fundamental concepts, such as
market basket analysis, with many techniques for frequent itemset mining presented
in an organized way. These range from the basic Apriori algorithm and its vari-
ations to more advanced methods that improve efﬁciency, including the frequent
pattern growth approach, frequent pattern mining with vertical data format, and min-
ing closed and max frequent itemsets. The chapter also discusses pattern evaluation
methods and introduces measures for mining correlated patterns. Chapter 7 is on
advanced pattern mining methods. It discusses methods for pattern mining in multi-
level and multidimensional space, mining rare and negative patterns, mining colossal
patterns and high-dimensional data, constraint-based pattern mining, and mining com-
pressed or approximate patterns. It also introduces methods for pattern exploration and
application, including semantic annotation of frequent patterns.
Chapters 8 and 9 describe methods for data classiﬁcation. Due to the importance
and diversity of classiﬁcation methods, the contents are partitioned into two chapters.
Chapter 8 introduces basic concepts and methods for classiﬁcation, including decision
tree induction, Bayes classiﬁcation, and rule-based classiﬁcation. It also discusses model
evaluation and selection methods and methods for improving classiﬁcation accuracy,
including ensemble methods and how to handle imbalanced data. Chapter 9 discusses
advanced methods for classiﬁcation, including Bayesian belief networks, the neural
network technique of backpropagation, support vector machines, classiﬁcation using
frequent patterns, k-nearest-neighbor classiﬁers, case-based reasoning, genetic algo-
rithms, rough set theory, and fuzzy set approaches. Additional topics include multiclass
classiﬁcation, semi-supervised classiﬁcation, active learning, and transfer learning.
Cluster analysis forms the topic of Chapters 10 and 11. Chapter 10 introduces the
basic concepts and methods for data clustering, including an overview of basic cluster
analysis methods, partitioning methods, hierarchical methods, density-based methods,
and grid-based methods. It also introduces methods for the evaluation of clustering.
Chapter 11 discusses advanced methods for clustering, including probabilistic model-
based clustering, clustering high-dimensional data, clustering graph and network data,
and clustering with constraints.

21 22 23 24 25 26 27 28 29 30 31