Page 70 -
P. 70

Page 33
                                                                                   #33
                                                                     3:12
                                                             2011/6/1
                          HAN 08-ch01-001-038-9780123814791
                                                                                   1.8 Summary    33


                                 Invisible data mining: We cannot expect everyone in society to learn and master
                                 data mining techniques. More and more systems should have data mining func-
                                 tions built within so that people can perform data mining or use data mining results
                                 simply by mouse clicking, without any knowledge of data mining algorithms. Intelli-
                                 gent search engines and Internet-based stores perform such invisible data mining by
                                 incorporating data mining into their components to improve their functionality and
                                 performance. This is done often unbeknownst to the user. For example, when pur-
                                 chasing items online, users may be unaware that the store is likely collecting data on
                                 the buying patterns of its customers, which may be used to recommend other items
                                 for purchase in the future.
                               These issues and many additional ones relating to the research, development, and
                               application of data mining are discussed throughout the book.

                       1.8     Summary


                                 Necessity is the mother of invention. With the mounting growth of data in every appli-
                                 cation, data mining meets the imminent need for effective, scalable, and flexible data
                                 analysis in our society. Data mining can be considered as a natural evolution of infor-
                                 mation technology and a confluence of several related disciplines and application
                                 domains.

                                 Data mining is the process of discovering interesting patterns from massive amounts
                                 of data. As a knowledge discovery process, it typically involves data cleaning, data inte-
                                 gration, data selection, data transformation, pattern discovery, pattern evaluation,
                                 and knowledge presentation.
                                 A pattern is interesting if it is valid on test data with some degree of certainty, novel,
                                 potentially useful (e.g., can be acted on or validates a hunch about which the user was
                                 curious), and easily understood by humans. Interesting patterns represent knowl-
                                 edge. Measures of pattern interestingness, either objective or subjective, can be used
                                 to guide the discovery process.
                                 We present a multidimensional view of data mining. The major dimensions are
                                 data, knowledge, technologies, and applications.
                                 Data mining can be conducted on any kind of data as long as the data are meaningful
                                 for a target application, such as database data, data warehouse data, transactional
                                 data, and advanced data types. Advanced data types include time-related or sequence
                                 data, data streams, spatial and spatiotemporal data, text and multimedia data, graph
                                 and networked data, and Web data.
                                 A data warehouse is a repository for long-term storage of data from multiple sources,
                                 organized so as to facilitate management decision making. The data are stored
                                 under a unified schema and are typically summarized. Data warehouse systems pro-
                                 vide multidimensional data analysis capabilities, collectively referred to as online
                                 analytical processing.
   65   66   67   68   69   70   71   72   73   74   75