Page 45 -
P. 45

HAN 08-ch01-001-038-9780123814791


          8     Chapter 1 Introduction                        2011/6/1  3:12  Page 8  #8



                         3. Data selection (where data relevant to the analysis task are retrieved from the
                           database)
                         4. Data transformation (where data are transformed and consolidated into forms
                           appropriate for mining by performing summary or aggregation operations) 4
                         5. Data mining (an essential process where intelligent methods are applied to extract
                           data patterns)
                         6. Pattern evaluation (to identify the truly interesting patterns representing knowledge
                           based on interestingness measures—see Section 1.4.6)
                         7. Knowledge presentation (where visualization and knowledge representation tech-
                           niques are used to present mined knowledge to users)

                           Steps 1 through 4 are different forms of data preprocessing, where data are prepared
                         for mining. The data mining step may interact with the user or a knowledge base. The
                         interesting patterns are presented to the user and may be stored as new knowledge in the
                         knowledge base.
                           The preceding view shows data mining as one step in the knowledge discovery pro-
                         cess, albeit an essential one because it uncovers hidden patterns for evaluation. However,
                         in industry, in media, and in the research milieu, the term data mining is often used to
                         refer to the entire knowledge discovery process (perhaps because the term is shorter
                         than knowledge discovery from data). Therefore, we adopt a broad view of data min-
                         ing functionality: Data mining is the process of discovering interesting patterns and
                         knowledge from large amounts of data. The data sources can include databases, data
                         warehouses, the Web, other information repositories, or data that are streamed into the
                         system dynamically.


                 1.3     What Kinds of Data Can Be Mined?


                         As a general technology, data mining can be applied to any kind of data as long as the
                         data are meaningful for a target application. The most basic forms of data for mining
                         applications are database data (Section 1.3.1), data warehouse data (Section 1.3.2),
                         and transactional data (Section 1.3.3). The concepts and techniques presented in this
                         book focus on such data. Data mining can also be applied to other forms of data (e.g.,
                         data streams, ordered/sequence data, graph or networked data, spatial data, text data,
                         multimedia data, and the WWW). We present an overview of such data in Section 1.3.4.
                         Techniques for mining of these kinds of data are briefly introduced in Chapter 13. In-
                         depth treatment is considered an advanced topic. Data mining will certainly continue
                         to embrace new data types as they emerge.

                         4 Sometimes data transformation and consolidation are performed before the data selection process,
                         particularly in the case of data warehousing. Data reduction may also be performed to obtain a smaller
                         representation of the original data without sacrificing its integrity.
   40   41   42   43   44   45   46   47   48   49   50