Page 42 -
P. 42

2011/6/1
                                                                      3:12
                                                                            Page 5
                           HAN 08-ch01-001-038-9780123814791
                                                                                   #5
                                                                         1.2 What Is Data Mining?  5




                                        How can I analyze these data?

























                     Figure 1.2 The world is data rich but information poor.


                                 In summary, the abundance of data, coupled with the need for powerful data analysis
                               tools, has been described as a data rich but information poor situation (Figure 1.2). The
                               fast-growing, tremendous amount of data, collected and stored in large and numerous
                               data repositories, has far exceeded our human ability for comprehension without power-
                               ful tools. As a result, data collected in large data repositories become “data tombs”—data
                               archives that are seldom visited. Consequently, important decisions are often made
                               based not on the information-rich data stored in data repositories but rather on a deci-
                               sion maker’s intuition, simply because the decision maker does not have the tools to
                               extract the valuable knowledge embedded in the vast amounts of data. Efforts have
                               been made to develop expert system and knowledge-based technologies, which typically
                               rely on users or domain experts to manually input knowledge into knowledge bases.
                               Unfortunately, however, the manual knowledge input procedure is prone to biases and
                               errors and is extremely costly and time consuming. The widening gap between data and
                               information calls for the systematic development of data mining tools that can turn data
                               tombs into “golden nuggets” of knowledge.

                       1.2     What Is Data Mining?


                               It is no surprise that data mining, as a truly interdisciplinary subject, can be defined
                               in many different ways. Even the term data mining does not really present all the major
                               components in the picture. To refer to the mining of gold from rocks or sand, we say gold
                               mining instead of rock or sand mining. Analogously, data mining should have been more
   37   38   39   40   41   42   43   44   45   46   47