Page 12 -
P. 12
#3
Page xi
2011/6/1 3:32
HAN 03-toc-ix-xviii-9780123814791
Contents xi
Chapter 3 Data Preprocessing 83
3.1 Data Preprocessing: An Overview 84
3.1.1 Data Quality: Why Preprocess the Data? 84
3.1.2 Major Tasks in Data Preprocessing 85
3.2 Data Cleaning 88
3.2.1 Missing Values 88
3.2.2 Noisy Data 89
3.2.3 Data Cleaning as a Process 91
3.3 Data Integration 93
3.3.1 Entity Identification Problem 94
3.3.2 Redundancy and Correlation Analysis 94
3.3.3 Tuple Duplication 98
3.3.4 Data Value Conflict Detection and Resolution 99
3.4 Data Reduction 99
3.4.1 Overview of Data Reduction Strategies 99
3.4.2 Wavelet Transforms 100
3.4.3 Principal Components Analysis 102
3.4.4 Attribute Subset Selection 103
3.4.5 Regression and Log-Linear Models: Parametric
Data Reduction 105
3.4.6 Histograms 106
3.4.7 Clustering 108
3.4.8 Sampling 108
3.4.9 Data Cube Aggregation 110
3.5 Data Transformation and Data Discretization 111
3.5.1 Data Transformation Strategies Overview 112
3.5.2 Data Transformation by Normalization 113
3.5.3 Discretization by Binning 115
3.5.4 Discretization by Histogram Analysis 115
3.5.5 Discretization by Cluster, Decision Tree, and Correlation
Analyses 116
3.5.6 Concept Hierarchy Generation for Nominal Data 117
3.6 Summary 120
3.7 Exercises 121
3.8 Bibliographic Notes 123
Chapter 4 Data Warehousing and Online Analytical Processing 125
4.1 Data Warehouse: Basic Concepts 125
4.1.1 What Is a Data Warehouse? 126
4.1.2 Differences between Operational Database Systems
and Data Warehouses 128
4.1.3 But, Why Have a Separate Data Warehouse? 129