Page 8 -
P. 8
xi i Contents
2.4 Principal Components ............................................................... 39
2.5 Feature Assessment .................................................................. 41
2.5.1 Graphic Inspection ........................................................ 42
2.5.2 Distribution Model Assessment ..................................... 43
2.5.3 Statistical Inference Tests ............................................. 44
2.6 The Dimensionality Ratio Problem ............................................. 46
Bibliography ............................................................................................ 49
Exercises ................................................................................................ 49
3 Data Clustering .................................................................................. 53
3.1 Unsupervised Classification ....................................................... 53
3.2 The Standardization Issue ...................................................... 55
3.3 Tree Clustering ........................................................................... 58
3.3.1 Linkage Rules ................................................................
60
3.3.2 Tree Clustering Experiments ......................................... 63
3.4 Dimensional Reduction ..............................................................
65
70
3.5 K-Means Clustering ....................................................................
3.6 Cluster Validation ....................................................................... 73
Bibliography ............................................................................................ 76
Exercises ................................................................................................ 77
4 Statistical Classification .................................................................... 79
Linear Discriminants ................................................................... 79
4.1 . 1 Minimum Distance Classifier ........................................ 79
4.1 . 2 Euclidian Linear Discriminants ...................................... 82
4.1 . 3 Mahalanobis Linear Discriminants ................................ 85
4.1.4 Fisher's Linear Discriminant .......................................... 88
Bayesian Classification .............................................................. 90
4.2.1 Bayes Rule for Minimum Risk ....................................... 90
4.2.2 Normal Bayesian Classification ..................................... 97
103
4.2.3 Reject Region ..............................................................
4.2.4 Dimensionality Ratio and Error Estimation .................. 105
Model-Free Techniques ........................................................... 108
4.3.1 The Parzen Window Method ....................................... 110
4.3.2 The K-Nearest Neighbours Method ............................ 113
4.3.3 The ROC Curve ........................................................... 116
121
Feature Selection .....................................................................
126
Classifier Evaluation .................................................................
Tree Classifiers ........................................................................
130
4.6.1 Decision Trees and Tables .......................................... 130
4.6.2 Automatic Generation of Tree Classifiers ................... 136