Page 56 -

P. 56

2011/6/1
Page 19
3:12
#19
HAN 08-ch01-001-038-9780123814791
1.4 What Kinds of Patterns Can Be Mined? 19

beconvertedtoclassiﬁcationrules.Aneuralnetwork,whenusedforclassiﬁcation,istyp-
ically a collection of neuron-like processing units with weighted connections between the
units. There are many other methods for constructing classiﬁcation models, such as na¨ ıve
Bayesian classiﬁcation, support vector machines, and k-nearest-neighbor classiﬁcation.
Whereas classiﬁcation predicts categorical (discrete, unordered) labels, regression
models continuous-valued functions. That is, regression is used to predict missing or
unavailable numerical data values rather than (discrete) class labels. The term prediction
refers to both numeric prediction and class label prediction. Regression analysis is a
statistical methodology that is most often used for numeric prediction, although other
methods exist as well. Regression also encompasses the identiﬁcation of distribution
trends based on the available data.
Classiﬁcation and regression may need to be preceded by relevance analysis, which
attempts to identify attributes that are signiﬁcantly relevant to the classiﬁcation and
regression process. Such attributes will be selected for the classiﬁcation and regression
process. Other attributes, which are irrelevant, can then be excluded from consideration.

Example 1.8 Classiﬁcation and regression. Suppose as a sales manager of AllElectronics you want to
classify a large set of items in the store, based on three kinds of responses to a sales cam-
paign: good response, mild response and no response. You want to derive a model for each
of these three classes based on the descriptive features of the items, such as price, brand,
place made, type, and category. The resulting classiﬁcation should maximally distinguish
each class from the others, presenting an organized picture of the data set.
Suppose that the resulting classiﬁcation is expressed as a decision tree. The decision
tree, for instance, may identify price as being the single factor that best distinguishes the
three classes. The tree may reveal that, in addition to price, other features that help to
further distinguish objects of each class from one another include brand and place made.
Such a decision tree may help you understand the impact of the given sales campaign
and design a more effective campaign in the future.
Suppose instead, that rather than predicting categorical response labels for each store
item, you would like to predict the amount of revenue that each item will generate
during an upcoming sale at AllElectronics, based on the previous sales data. This is an
example of regression analysis because the regression model constructed will predict a
continuous function (or ordered value.)

Chapters 8 and 9 discuss classiﬁcation in further detail. Regression analysis is beyond
the scope of this book. Sources for further information are given in the bibliographic
notes.

1.4.4 Cluster Analysis
Unlike classiﬁcation and regression, which analyze class-labeled (training) data sets,
clustering analyzes data objects without consulting class labels. In many cases, class-
labeled data may simply not exist at the beginning. Clustering can be used to generate

51 52 53 54 55 56 57 58 59 60 61