Page 42 -
P. 42
2011/6/1
3:12
Page 5
HAN 08-ch01-001-038-9780123814791
#5
1.2 What Is Data Mining? 5
How can I analyze these data?
Figure 1.2 The world is data rich but information poor.
In summary, the abundance of data, coupled with the need for powerful data analysis
tools, has been described as a data rich but information poor situation (Figure 1.2). The
fast-growing, tremendous amount of data, collected and stored in large and numerous
data repositories, has far exceeded our human ability for comprehension without power-
ful tools. As a result, data collected in large data repositories become “data tombs”—data
archives that are seldom visited. Consequently, important decisions are often made
based not on the information-rich data stored in data repositories but rather on a deci-
sion maker’s intuition, simply because the decision maker does not have the tools to
extract the valuable knowledge embedded in the vast amounts of data. Efforts have
been made to develop expert system and knowledge-based technologies, which typically
rely on users or domain experts to manually input knowledge into knowledge bases.
Unfortunately, however, the manual knowledge input procedure is prone to biases and
errors and is extremely costly and time consuming. The widening gap between data and
information calls for the systematic development of data mining tools that can turn data
tombs into “golden nuggets” of knowledge.
1.2 What Is Data Mining?
It is no surprise that data mining, as a truly interdisciplinary subject, can be defined
in many different ways. Even the term data mining does not really present all the major
components in the picture. To refer to the mining of gold from rocks or sand, we say gold
mining instead of rock or sand mining. Analogously, data mining should have been more