Page 142 -
P. 142

3:16 Page 105
                                                            2011/6/1
                               10-ch03-083-124-9780123814791
                         HAN
                                                                                    #23
                                                                              3.4 Data Reduction  105


                               1. Stepwise forward selection: The procedure starts with an empty set of attributes as
                                 the reduced set. The best of the original attributes is determined and added to the
                                 reduced set. At each subsequent iteration or step, the best of the remaining original
                                 attributes is added to the set.
                               2. Stepwise backward elimination: The procedure starts with the full set of attributes.
                                 At each step, it removes the worst attribute remaining in the set.
                               3. Combination of forward selection and backward elimination: The stepwise for-
                                 ward selection and backward elimination methods can be combined so that, at each
                                 step, the procedure selects the best attribute and removes the worst from among the
                                 remaining attributes.
                               4. Decision tree induction: Decision tree algorithms (e.g., ID3, C4.5, and CART) were
                                 originally intended for classification. Decision tree induction constructs a flowchart-
                                 like structure where each internal (nonleaf) node denotes a test on an attribute, each
                                 branch corresponds to an outcome of the test, and each external (leaf) node denotes a
                                 class prediction. At each node, the algorithm chooses the “best” attribute to partition
                                 the data into individual classes.
                                    When decision tree induction is used for attribute subset selection, a tree is con-
                                 structed from the given data. All attributes that do not appear in the tree are assumed
                                 to be irrelevant. The set of attributes appearing in the tree form the reduced subset
                                 of attributes.

                               The stopping criteria for the methods may vary. The procedure may employ a threshold
                               on the measure used to determine when to stop the attribute selection process.
                                 In some cases, we may want to create new attributes based on others. Such attribute
                                         6
                               construction can help improve accuracy and understanding of structure in high-
                               dimensional data. For example, we may wish to add the attribute area based on the
                               attributes height and width. By combining attributes, attribute construction can dis-
                               cover missing information about the relationships between data attributes that can be
                               useful for knowledge discovery.

                         3.4.5 Regression and Log-Linear Models: Parametric
                               Data Reduction
                               Regression and log-linear models can be used to approximate the given data. In (simple)
                               linear regression, the data are modeled to fit a straight line. For example, a random
                               variable, y (called a response variable), can be modeled as a linear function of another
                               random variable, x (called a predictor variable), with the equation
                                                            y = wx + b,                         (3.7)
                               where the variance of y is assumed to be constant. In the context of data mining, x and y
                               are numeric database attributes. The coefficients, w and b (called regression coefficients),

                               6 In the machine learning literature, attribute construction is known as feature construction.
   137   138   139   140   141   142   143   144   145   146   147