Page 168 - Handbook of Deep Learning in Biomedical Engineering Techniques and Applications
P. 168
Chapter 6 Plant leaf disease classification based on feature selection 157
2.3 Feature selection
Feature selection (FS) is a procedure commonly employed in
machine learning to solve the high dimensionality problem.
It selects a subset of essential features and removes irrelevant,
noisy, and dismissed features for simpler and more concise
data representation. In FS, a subset of features is selected
from the original set of features based on features redundancy
and relevance. Based on the relevance and redundant features,
Yu et al. [8] in 2004 have classified the feature subset as four
types. They are (1) noisy and irrelevant, (2) redundant and
weakly relevant, (3) weakly relevant and nonredundant, and (4)
strongly relevant. The feature that did not require for predicting
accuracy is known as an irrelevant feature. Some of the popular
approaches that fitinto filter and wrapper methods are models,
search strategies, feature quality measures, and feature evalua-
tion. Set of features are key factors for determining the hypothe-
sis of the predicting models. The number of features and the
hypothesis space are directly proportional to each other, i.e., as
the number of features increases, the hypothesis space is also
increased. For example, if there are M features with the binary
class label in a data set, then it has 2 2 M combinations in the
search space.
FS methods are classified into three types, based on the inter-
action with the learning model such as filter, wrapper, and
embedded methods. In the filter method, features are selected
based on statistical measures. It is independent of the learning
algorithm and requires less computational time. Information
gain, chi-square test [9], Fisher score, correlation coefficient,
and variance threshold are some of the statistical measures
used to understand the importance of the features. The perfor-
mance of the Wrapper method depends on the classifier. The
best subset of features is selected based on the results of the
classifier. Wrapper methods are computationally more expen-
sive than filter methods, due to the repeated learning steps and
cross-validation. However, these methods are more accurate
than the filter method. Some of the examples are recursive
feature elimination [10], sequential FS algorithms [11], and
genetic algorithms. The third approach is the embedded method
that uses ensemble learning and hybrid learning methods for FS.
Since it has a collective decision, its performance is better than
the other two models. Random forest is one such example. It is
computationally less intensive than wrapper methods. However,
this method has a drawback of specific to a learning model.