Page 67 -
P. 67

HAN 08-ch01-001-038-9780123814791


          30    Chapter 1 Introduction                       2011/6/1  3:12  Page 30  #30



                           to mine data with natural language text, it makes sense to fuse data mining methods
                           with methods of information retrieval and natural language processing. As another
                           example, consider the mining of software bugs in large programs. This form of min-
                           ing, known as bug mining, benefits from the incorporation of software engineering
                           knowledge into the data mining process.
                           Boosting the power of discovery in a networked environment: Most data objects reside
                           in a linked or interconnected environment, whether it be the Web, database rela-
                           tions, files, or documents. Semantic links across multiple data objects can be used
                           to advantage in data mining. Knowledge derived in one set of objects can be used
                           to boost the discovery of knowledge in a “related” or semantically linked set of
                           objects.
                           Handling uncertainty, noise, or incompleteness of data: Data often contain noise,
                           errors, exceptions, or uncertainty, or are incomplete. Errors and noise may confuse
                           the data mining process, leading to the derivation of erroneous patterns. Data clean-
                           ing, data preprocessing, outlier detection and removal, and uncertainty reasoning are
                           examples of techniques that need to be integrated with the data mining process.
                           Pattern evaluation and pattern- or constraint-guided mining: Not all the patterns gen-
                           erated by data mining processes are interesting. What makes a pattern interesting
                           may vary from user to user. Therefore, techniques are needed to assess the inter-
                           estingness of discovered patterns based on subjective measures. These estimate the
                           value of patterns with respect to a given user class, based on user beliefs or expec-
                           tations. Moreover, by using interestingness measures or user-specified constraints to
                           guide the discovery process, we may generate more interesting patterns and reduce
                           the search space.


                   1.7.2 User Interaction

                         The user plays an important role in the data mining process. Interesting areas of research
                         include how to interact with a data mining system, how to incorporate a user’s back-
                         ground knowledge in mining, and how to visualize and comprehend data mining results.
                         We introduce each of these here.

                           Interactive mining: The data mining process should be highly interactive. Thus, it is
                           important to build flexible user interfaces and an exploratory mining environment,
                           facilitating the user’s interaction with the system. A user may like to first sample a
                           set of data, explore general characteristics of the data, and estimate potential min-
                           ing results. Interactive mining should allow users to dynamically change the focus
                           of a search, to refine mining requests based on returned results, and to drill, dice,
                           and pivot through the data and knowledge space interactively, dynamically exploring
                           “cube space” while mining.
                           Incorporation of background knowledge: Background knowledge, constraints, rules,
                           and other information regarding the domain under study should be incorporated
   62   63   64   65   66   67   68   69   70   71   72