Page 20 -

P. 20

3:32
04-fore-xix-xxii-9780123814791
2011/6/1
Page xix
#1
HAN

Foreword

Analyzing large amounts of data is a necessity. Even popular science books, like “super
crunchers,” give compelling cases where large amounts of data yield discoveries and
intuitions that surprise even experts. Every enterprise beneﬁts from collecting and ana-
lyzing its data: Hospitals can spot trends and anomalies in their patient records, search
engines can do better ranking and ad placement, and environmental and public health
agencies can spot patterns and abnormalities in their data. The list continues, with
cybersecurity and computer network intrusion detection; monitoring of the energy
consumption of household appliances; pattern analysis in bioinformatics and pharma-
ceutical data; ﬁnancial and business intelligence data; spotting trends in blogs, Twitter,
and many more. Storage is inexpensive and getting even less so, as are data sensors. Thus,
collecting and storing data is easier than ever before.
The problem then becomes how to analyze the data. This is exactly the focus of this
Third Edition of the book. Jiawei, Micheline, and Jian give encyclopedic coverage of all
the related methods, from the classic topics of clustering and classiﬁcation, to database
methods (e.g., association rules, data cubes) to more recent and advanced topics (e.g.,
SVD/PCA, wavelets, support vector machines).
The exposition is extremely accessible to beginners and advanced readers alike. The
book gives the fundamental material ﬁrst and the more advanced material in follow-up
chapters. It also has numerous rhetorical questions, which I found extremely helpful for
maintaining focus.
We have used the ﬁrst two editions as textbooks in data mining courses at Carnegie
Mellon and plan to continue to do so with this Third Edition. The new version has
signiﬁcant additions: Notably, it has more than 100 citations to works from 2006
onward, focusing on more recent material such as graphs and social networks, sen-
sor networks, and outlier detection. This book has a new section for visualization, has
expanded outlier detection into a whole chapter, and has separate chapters for advanced

xix

15 16 17 18 19 20 21 22 23 24 25