Page 12 -
P. 12

#3
                                                                          Page xi
                                                            2011/6/1 3:32
                            HAN 03-toc-ix-xviii-9780123814791
                                                                                       Contents   xi



                       Chapter 3 Data Preprocessing  83
                                 3.1   Data Preprocessing: An Overview  84
                                       3.1.1  Data Quality: Why Preprocess the Data?  84
                                       3.1.2  Major Tasks in Data Preprocessing  85
                                 3.2   Data Cleaning  88
                                       3.2.1  Missing Values  88
                                       3.2.2  Noisy Data  89
                                       3.2.3  Data Cleaning as a Process  91
                                 3.3   Data Integration  93
                                       3.3.1  Entity Identification Problem  94
                                       3.3.2  Redundancy and Correlation Analysis  94
                                       3.3.3  Tuple Duplication 98
                                       3.3.4  Data Value Conflict Detection and Resolution  99
                                 3.4   Data Reduction 99
                                       3.4.1  Overview of Data Reduction Strategies  99
                                       3.4.2  Wavelet Transforms  100
                                       3.4.3  Principal Components Analysis 102
                                       3.4.4  Attribute Subset Selection 103
                                       3.4.5  Regression and Log-Linear Models: Parametric
                                             Data Reduction 105
                                       3.4.6  Histograms  106
                                       3.4.7  Clustering 108
                                       3.4.8  Sampling  108
                                       3.4.9  Data Cube Aggregation 110
                                 3.5   Data Transformation and Data Discretization  111
                                       3.5.1  Data Transformation Strategies Overview 112
                                       3.5.2  Data Transformation by Normalization 113
                                       3.5.3  Discretization by Binning  115
                                       3.5.4  Discretization by Histogram Analysis 115
                                       3.5.5  Discretization by Cluster, Decision Tree, and Correlation
                                             Analyses 116
                                       3.5.6  Concept Hierarchy Generation for Nominal Data 117
                                 3.6   Summary    120
                                 3.7   Exercises  121
                                 3.8   Bibliographic Notes  123

                       Chapter 4 Data Warehousing and Online Analytical Processing  125
                                 4.1   Data Warehouse: Basic Concepts   125
                                       4.1.1  What Is a Data Warehouse?  126
                                       4.1.2  Differences between Operational Database Systems
                                             and Data Warehouses  128
                                       4.1.3  But, Why Have a Separate Data Warehouse? 129
   7   8   9   10   11   12   13   14   15   16   17