Page 26 -
P. 26

05-pref-xxiii-xxx-9780123814791
                                                                     3:35
                                                             2011/6/1
                                                                           Page xxv
                                                                                    #3
                          HAN
                                                                                         Preface  xxv

                                 Chapter 3 introduces techniques for data preprocessing. It first introduces the con-
                               cept of data quality and then discusses methods for data cleaning, data integration, data
                               reduction, data transformation, and data discretization.
                                 Chapters 4 and 5 provide a solid introduction to data warehouses, OLAP (online ana-
                               lytical processing), and data cube technology. Chapter 4 introduces the basic concepts,
                               modeling, design architectures, and general implementations of data warehouses and
                               OLAP, as well as the relationship between data warehousing and other data generali-
                               zation methods. Chapter 5 takes an in-depth look at data cube technology, presenting a
                               detailed study of methods of data cube computation, including Star-Cubing and high-
                               dimensional OLAP methods. Further explorations of data cube and OLAP technologies
                               are discussed, such as sampling cubes, ranking cubes, prediction cubes, multifeature
                               cubes for complex analysis queries, and discovery-driven cube exploration.
                                 Chapters 6 and 7 present methods for mining frequent patterns, associations, and
                               correlations in large data sets. Chapter 6 introduces fundamental concepts, such as
                               market basket analysis, with many techniques for frequent itemset mining presented
                               in an organized way. These range from the basic Apriori algorithm and its vari-
                               ations to more advanced methods that improve efficiency, including the frequent
                               pattern growth approach, frequent pattern mining with vertical data format, and min-
                               ing closed and max frequent itemsets. The chapter also discusses pattern evaluation
                               methods and introduces measures for mining correlated patterns. Chapter 7 is on
                               advanced pattern mining methods. It discusses methods for pattern mining in multi-
                               level and multidimensional space, mining rare and negative patterns, mining colossal
                               patterns and high-dimensional data, constraint-based pattern mining, and mining com-
                               pressed or approximate patterns. It also introduces methods for pattern exploration and
                               application, including semantic annotation of frequent patterns.
                                 Chapters 8 and 9 describe methods for data classification. Due to the importance
                               and diversity of classification methods, the contents are partitioned into two chapters.
                               Chapter 8 introduces basic concepts and methods for classification, including decision
                               tree induction, Bayes classification, and rule-based classification. It also discusses model
                               evaluation and selection methods and methods for improving classification accuracy,
                               including ensemble methods and how to handle imbalanced data. Chapter 9 discusses
                               advanced methods for classification, including Bayesian belief networks, the neural
                               network technique of backpropagation, support vector machines, classification using
                               frequent patterns, k-nearest-neighbor classifiers, case-based reasoning, genetic algo-
                               rithms, rough set theory, and fuzzy set approaches. Additional topics include multiclass
                               classification, semi-supervised classification, active learning, and transfer learning.
                                 Cluster analysis forms the topic of Chapters 10 and 11. Chapter 10 introduces the
                               basic concepts and methods for data clustering, including an overview of basic cluster
                               analysis methods, partitioning methods, hierarchical methods, density-based methods,
                               and grid-based methods. It also introduces methods for the evaluation of clustering.
                               Chapter 11 discusses advanced methods for clustering, including probabilistic model-
                               based clustering, clustering high-dimensional data, clustering graph and network data,
                               and clustering with constraints.
   21   22   23   24   25   26   27   28   29   30   31