Page 142 - Handbook of Deep Learning in Biomedical Engineering Techniques and Applications

P. 142

Chapter 5 Depression discovery in cancer communities using deep learning 131

differentiate among formal and functional passages and, within
them, between description and comment. Next, they analyze the
sentiment and achieve enhancement over a baseline without pas-
sage classiﬁcation. There are certain limitations of their approach.
Firstly, the size of their training data is only 100 texts, which is very
less. To improve the performance of the classiﬁers, the size of the
training data set needs to be increased. Secondly, the authors sim-
ply use words and parts-of-speech (POS) labels as features. To
improve performance, syntactic arrangements and verb properties
should be taken into account that cannot be seized using words
and POS tags.
In Ref. [91], the author uses different ML classiﬁers such as NB,
SVM, random forests, and NNs for SA of movie reviews and
tweets. He experiments with different feature sets and preprocess-
ing steps and concludes that recurrent neural network (RNNs)
perform slightly better than other classiﬁers. To improve perfor-
mance, more investigation must be done with different feature
sets and preprocessing steps.
In Ref. [44], the authors compare different ML classiﬁers for the
analysis of political views in Urdu tweets. The major limitation of
their approach is resolving the challenges of translation from
Urdu to English, as sometimes the meaning is lost in translation.

2.2.3 Metaclassifiers
Metaclassiﬁers combine several ML models into one predic-
tive model by bagging (decreasing variance), boosting
(decreasing bias), or stacking (improving predictions) [45]. SA
studies that use metaclassiﬁers are listed in Table 5.3.
Authors in Ref. [46] experiment with a large number of
different ML algorithms and feature sets to detect positive or
negative favorability in documents. They test a number of
different classiﬁers and conclude that SVM, k-NN, NB, BN, deci-
sion tree (DT), and a rule learner show promising performance.
They show that the samples of all the classes should be equally
represented in the training data for the classiﬁers to be effective.
This holds for all the classiﬁers except NB. Further exploration
needs to be done to evaluate whether the class distribution should
be balanced and to evaluate other features.
Authors in Ref. [47] study the impact of Word of Mouth public-
ity on Twitter on motion picture sales. They use NB and SVM clas-
siﬁers for SA. However, they are unable to ascertain the sentiment
correctly from sarcastic tweets.

137 138 139 140 141 142 143 144 145 146 147