Page 261 - Classification Parameter Estimation & State Estimation An Engg Approach Using MATLAB
P. 261
250 UNSUPERVISED LEARNING
7.3 REFERENCES
Bishop, C.M., Neural Networks for Pattern Recognition, Oxford University Press,
Oxford, UK, 1995.
´
Bishop, C.M., Svensen, M. and Williams, C.K.I., GTM: the generative topographic
mapping. Neural Computation, 10(1), 215–34, 1998.
Dempster, N.M., Laird, A.P. and Rubin, D.B., Maximum likelihood from incomplete
data via the EM algorithm. Journal of the Royal Statististical Society B, 39, 185–197,
1977.
Johnson, S.C., Hierarchical clustering schemes. Psychometrika, 2, 241–54, 1967.
Jolliffe, I.T., Principal Component Analysis, Springer-Verlag, New York, 1986.
Kohonen, T., Self-organizing Maps, Springer-Verlag, Heidelberg, Germany, 1995.
Kruskal, J.B. and Wish, M., Multidimensional scaling, Sage Publications, Beverly Hills,
CA, 1977.
Sammon, Jr, J.W., A nonlinear mapping for data structure analysis. IEEE Transactions
on Computers, C-18, 401–9, 1969.
Tipping, M.E. and Bishop, C.M., Mixtures of probabilistic principal component ana-
lyzers. Neural Computation, 11(2), 443–82, 1999.
7.4 EXERCISES
1. Generate a two-dimensional data set z uniformly distributed between 0 and 1. Create a
second data set y uniformly distributed between 1 and 2. Compute the (Euclidean)
distances between z and y, and find the objects in y which have distance smaller than 1
to an object in z. Make a scatter plot of these objects. Using a large number of objects
in z, what should be the shape of the area of objects with distance smaller than 1?
What would happen if you change the distance definition to the city-block distance
(Minkowski metric with q ¼ 1)? And what would happen if the cosine distance is
used? (0)
2. Create a data set z ¼ gendatd(50, 50, 4,2); . Make a scatter plot of the data. Is the
data separable? Predict what would happen if the data is mapped to one dimension.
Check your prediction by mapping the data using pca(z,1), and training a simple
classifier on the mapped data (such as ldc). (0)
3. Load the worldcities data set and experiment with using different values for q in
the MDS criterion function. What is the effect? Can you think of another way of
treating close sample pairs different from far-away sample pairs? (0)
4. Derive equations (7.21), (7.23) and (7.27). ( )
5. Discuss in which data sets it can be expected that the data is distributed in a subspace,
or in clusters. In which cases will it not be useful to apply clustering or subspace
methods? ( )
6. What is a desirable property of a clustering when the same algorithm is run multiple
times on the same data set? Develop an algorithm that uses this notion to estimate the
number of clusters present in the data. ( )
7. In terms of scatter matrices (see the previous chapter), what does the K-means algo-
rithm minimize? (0)