Page 141 -

P. 141

HAN
10-ch03-083-124-9780123814791

104 Chapter 3 Data Preprocessing 2011/6/1 3:16 Page 104 #22

4
Attribute subset selection reduces the data set size by removing irrelevant or
redundant attributes (or dimensions). The goal of attribute subset selection is to ﬁnd
a minimum set of attributes such that the resulting probability distribution of the data
classes is as close as possible to the original distribution obtained using all attributes.
Mining on a reduced set of attributes has an additional beneﬁt: It reduces the number
of attributes appearing in the discovered patterns, helping to make the patterns easier to
understand.
“How can we ﬁnd a ‘good’ subset of the original attributes?” For n attributes, there are
n
2 possible subsets. An exhaustive search for the optimal subset of attributes can be pro-
hibitively expensive, especially as n and the number of data classes increase. Therefore,
heuristic methods that explore a reduced search space are commonly used for attribute
subset selection. These methods are typically greedy in that, while searching through
attribute space, they always make what looks to be the best choice at the time. Their
strategy is to make a locally optimal choice in the hope that this will lead to a globally
optimal solution. Such greedy methods are effective in practice and may come close to
estimating an optimal solution.
The “best” (and “worst”) attributes are typically determined using tests of statistical
signiﬁcance, which assume that the attributes are independent of one another. Many
other attribute evaluation measures can be used such as the information gain measure
used in building decision trees for classiﬁcation. 5
Basic heuristic methods of attribute subset selection include the techniques that
follow, some of which are illustrated in Figure 3.6.

Forward selection Backward elimination Decision tree induction
Initial attribute set: Initial attribute set: Initial attribute set:
{A , A , A , A , A , A } {A , A , A , A , A , A } {A , A , A , A , A , A }
1
4
5
5
3
2
1
2
3
4
6
6
6
1
4
2
3
5
Initial reduced set: => {A , A , A , A , A }
5
6
3
1
4
{} => {A , A , A , A } A 4 ?
5
1
6
4
=> {A } => Reduced attribute set:
1
=> {A , A } {A , A , A } Y N
1
1 4
6
4
=> Reduced attribute set:
{A , A , A } A 1 ? A 6 ?
6
4
1
Y N Y N
Class 1 Class 2 Class 1 Class 2
=> Reduced attribute set:
{A , A , A }
6
1
4
Figure 3.6 Greedy (heuristic) methods for attribute subset selection.
4 In machine learning, attribute subset selection is known as feature subset selection.
5 The information gain measure is described in detail in Chapter 8.

136 137 138 139 140 141 142 143 144 145 146