Page 67 -
P. 67
HAN 08-ch01-001-038-9780123814791
30 Chapter 1 Introduction 2011/6/1 3:12 Page 30 #30
to mine data with natural language text, it makes sense to fuse data mining methods
with methods of information retrieval and natural language processing. As another
example, consider the mining of software bugs in large programs. This form of min-
ing, known as bug mining, benefits from the incorporation of software engineering
knowledge into the data mining process.
Boosting the power of discovery in a networked environment: Most data objects reside
in a linked or interconnected environment, whether it be the Web, database rela-
tions, files, or documents. Semantic links across multiple data objects can be used
to advantage in data mining. Knowledge derived in one set of objects can be used
to boost the discovery of knowledge in a “related” or semantically linked set of
objects.
Handling uncertainty, noise, or incompleteness of data: Data often contain noise,
errors, exceptions, or uncertainty, or are incomplete. Errors and noise may confuse
the data mining process, leading to the derivation of erroneous patterns. Data clean-
ing, data preprocessing, outlier detection and removal, and uncertainty reasoning are
examples of techniques that need to be integrated with the data mining process.
Pattern evaluation and pattern- or constraint-guided mining: Not all the patterns gen-
erated by data mining processes are interesting. What makes a pattern interesting
may vary from user to user. Therefore, techniques are needed to assess the inter-
estingness of discovered patterns based on subjective measures. These estimate the
value of patterns with respect to a given user class, based on user beliefs or expec-
tations. Moreover, by using interestingness measures or user-specified constraints to
guide the discovery process, we may generate more interesting patterns and reduce
the search space.
1.7.2 User Interaction
The user plays an important role in the data mining process. Interesting areas of research
include how to interact with a data mining system, how to incorporate a user’s back-
ground knowledge in mining, and how to visualize and comprehend data mining results.
We introduce each of these here.
Interactive mining: The data mining process should be highly interactive. Thus, it is
important to build flexible user interfaces and an exploratory mining environment,
facilitating the user’s interaction with the system. A user may like to first sample a
set of data, explore general characteristics of the data, and estimate potential min-
ing results. Interactive mining should allow users to dynamically change the focus
of a search, to refine mining requests based on returned results, and to drill, dice,
and pivot through the data and knowledge space interactively, dynamically exploring
“cube space” while mining.
Incorporation of background knowledge: Background knowledge, constraints, rules,
and other information regarding the domain under study should be incorporated