Page 51 -
P. 51
HAN 08-ch01-001-038-9780123814791
14 Chapter 1 Introduction 2011/6/1 3:12 Page 14 #14
of items that are frequently sold together. The mining of such frequent patterns from
transactional data is discussed in Chapters 6 and 7.
1.3.4 Other Kinds of Data
Besides relational database data, data warehouse data, and transaction data, there are
many other kinds of data that have versatile forms and structures and rather different
semantic meanings. Such kinds of data can be seen in many applications: time-related
or sequence data (e.g., historical records, stock exchange data, and time-series and bio-
logical sequence data), data streams (e.g., video surveillance and sensor data, which are
continuously transmitted), spatial data (e.g., maps), engineering design data (e.g., the
design of buildings, system components, or integrated circuits), hypertext and multi-
media data (including text, image, video, and audio data), graph and networked data
(e.g., social and information networks), and the Web (a huge, widely distributed infor-
mation repository made available by the Internet). These applications bring about new
challenges, like how to handle data carrying special structures (e.g., sequences, trees,
graphs, and networks) and specific semantics (such as ordering, image, audio and video
contents, and connectivity), and how to mine patterns that carry rich structures and
semantics.
Various kinds of knowledge can be mined from these kinds of data. Here, we list
just a few. Regarding temporal data, for instance, we can mine banking data for chang-
ing trends, which may aid in the scheduling of bank tellers according to the volume of
customer traffic. Stock exchange data can be mined to uncover trends that could help
you plan investment strategies (e.g., the best time to purchase AllElectronics stock). We
could mine computer network data streams to detect intrusions based on the anomaly of
message flows, which may be discovered by clustering, dynamic construction of stream
models or by comparing the current frequent patterns with those at a previous time.
With spatial data, we may look for patterns that describe changes in metropolitan
poverty rates based on city distances from major highways. The relationships among
a set of spatial objects can be examined in order to discover which subsets of objects
are spatially autocorrelated or associated. By mining text data, such as literature on data
mining from the past ten years, we can identify the evolution of hot topics in the field. By
mining user comments on products (which are often submitted as short text messages),
we can assess customer sentiments and understand how well a product is embraced by
a market. From multimedia data, we can mine images to identify objects and classify
them by assigning semantic labels or tags. By mining video data of a hockey game, we
can detect video sequences corresponding to goals. Web mining can help us learn about
the distribution of information on the WWW in general, characterize and classify web
pages, and uncover web dynamics and the association and other relationships among
different web pages, users, communities, and web-based activities.
It is important to keep in mind that, in many applications, multiple types of data
are present. For example, in web mining, there often exist text data and multimedia
data (e.g., pictures and videos) on web pages, graph data like web graphs, and map
data on some web sites. In bioinformatics, genomic sequences, biological networks, and