Page 56 -
P. 56

2011/6/1
                                                                           Page 19
                                                                     3:12
                                                                                   #19
                          HAN 08-ch01-001-038-9780123814791
                                                            1.4 What Kinds of Patterns Can Be Mined?  19


                               beconvertedtoclassificationrules.Aneuralnetwork,whenusedforclassification,istyp-
                               ically a collection of neuron-like processing units with weighted connections between the
                               units. There are many other methods for constructing classification models, such as na¨ ıve
                               Bayesian classification, support vector machines, and k-nearest-neighbor classification.
                                 Whereas classification predicts categorical (discrete, unordered) labels, regression
                               models continuous-valued functions. That is, regression is used to predict missing or
                               unavailable numerical data values rather than (discrete) class labels. The term prediction
                               refers to both numeric prediction and class label prediction. Regression analysis is a
                               statistical methodology that is most often used for numeric prediction, although other
                               methods exist as well. Regression also encompasses the identification of distribution
                               trends based on the available data.
                                 Classification and regression may need to be preceded by relevance analysis, which
                               attempts to identify attributes that are significantly relevant to the classification and
                               regression process. Such attributes will be selected for the classification and regression
                               process. Other attributes, which are irrelevant, can then be excluded from consideration.

                  Example 1.8 Classification and regression. Suppose as a sales manager of AllElectronics you want to
                               classify a large set of items in the store, based on three kinds of responses to a sales cam-
                               paign: good response, mild response and no response. You want to derive a model for each
                               of these three classes based on the descriptive features of the items, such as price, brand,
                               place made, type, and category. The resulting classification should maximally distinguish
                               each class from the others, presenting an organized picture of the data set.
                                 Suppose that the resulting classification is expressed as a decision tree. The decision
                               tree, for instance, may identify price as being the single factor that best distinguishes the
                               three classes. The tree may reveal that, in addition to price, other features that help to
                               further distinguish objects of each class from one another include brand and place made.
                               Such a decision tree may help you understand the impact of the given sales campaign
                               and design a more effective campaign in the future.
                                 Suppose instead, that rather than predicting categorical response labels for each store
                               item, you would like to predict the amount of revenue that each item will generate
                               during an upcoming sale at AllElectronics, based on the previous sales data. This is an
                               example of regression analysis because the regression model constructed will predict a
                               continuous function (or ordered value.)


                                 Chapters 8 and 9 discuss classification in further detail. Regression analysis is beyond
                               the scope of this book. Sources for further information are given in the bibliographic
                               notes.


                         1.4.4 Cluster Analysis
                               Unlike classification and regression, which analyze class-labeled (training) data sets,
                               clustering analyzes data objects without consulting class labels. In many cases, class-
                               labeled data may simply not exist at the beginning. Clustering can be used to generate
   51   52   53   54   55   56   57   58   59   60   61