Page 227 - Classification Parameter Estimation & State Estimation An Engg Approach Using MATLAB

P. 227

216 UNSUPERVISED LEARNING

chapter we will discuss two main characteristics which can be explored:
the subspace structure of data and its clustering characteristics. The first
tries to summarize the objects using a smaller number of features than
the original number of measurements; the second tries to summarize the
data set using a smaller number of objects than the original number.
Subspace structure is often interesting for visualization purposes. The
human visual system is highly capable of finding and interpreting struc-
ture in 2D and 3D graphical representations of data. When higher
dimensional data is available, a transformation to 2D or 3D might
facilitate its interpretation by humans. Clustering serves a similar pur-
pose, interpretation, but also data reduction. When very large amounts
of data are available, it is often more efficient to work with cluster
representatives instead of the whole data set. In Section 7.1 we will treat
feature reduction, in Section 7.2 we discuss clustering.

7.1 FEATURE REDUCTION

The most popular unsupervised feature reduction method is principal
component analysis (Jolliffe, 1986). This will be discussed in Section
7.1.1. One of the drawbacks of this method is that it is a linear method,
so nonlinear structures in the data cannot be modelled. In Section 7.1.2
multi-dimensional scaling is introduced, which is a nonlinear feature
reduction method.

7.1.1 Principal component analysis

The purpose of principal component analysis (PCA) is to transform a
high dimensional measurement vector z to a much lower dimensional
feature vector y by means of an operation:

z
y ¼ W D ðz zÞ ð7:1Þ
such that z can be reconstructed accurately from y.
z is the expectation of the random vector z. It is constant for all realiza-
z
tions of z. Without loss of generality, we can assume that z ¼ 0 because we
z
z
z
can always introduce a new measurement vector, ~ z ¼ z z, and apply the
analysis to this vector. Hence, under that assumption, we have:
y ¼ W D z ð7:2Þ

222 223 224 225 226 227 228 229 230 231 232