Page 142 - Handbook of Deep Learning in Biomedical Engineering Techniques and Applications
P. 142
Chapter 5 Depression discovery in cancer communities using deep learning 131
differentiate among formal and functional passages and, within
them, between description and comment. Next, they analyze the
sentiment and achieve enhancement over a baseline without pas-
sage classification. There are certain limitations of their approach.
Firstly, the size of their training data is only 100 texts, which is very
less. To improve the performance of the classifiers, the size of the
training data set needs to be increased. Secondly, the authors sim-
ply use words and parts-of-speech (POS) labels as features. To
improve performance, syntactic arrangements and verb properties
should be taken into account that cannot be seized using words
and POS tags.
In Ref. [91], the author uses different ML classifiers such as NB,
SVM, random forests, and NNs for SA of movie reviews and
tweets. He experiments with different feature sets and preprocess-
ing steps and concludes that recurrent neural network (RNNs)
perform slightly better than other classifiers. To improve perfor-
mance, more investigation must be done with different feature
sets and preprocessing steps.
In Ref. [44], the authors compare different ML classifiers for the
analysis of political views in Urdu tweets. The major limitation of
their approach is resolving the challenges of translation from
Urdu to English, as sometimes the meaning is lost in translation.
2.2.3 Metaclassifiers
Metaclassifiers combine several ML models into one predic-
tive model by bagging (decreasing variance), boosting
(decreasing bias), or stacking (improving predictions) [45]. SA
studies that use metaclassifiers are listed in Table 5.3.
Authors in Ref. [46] experiment with a large number of
different ML algorithms and feature sets to detect positive or
negative favorability in documents. They test a number of
different classifiers and conclude that SVM, k-NN, NB, BN, deci-
sion tree (DT), and a rule learner show promising performance.
They show that the samples of all the classes should be equally
represented in the training data for the classifiers to be effective.
This holds for all the classifiers except NB. Further exploration
needs to be done to evaluate whether the class distribution should
be balanced and to evaluate other features.
Authors in Ref. [47] study the impact of Word of Mouth public-
ity on Twitter on motion picture sales. They use NB and SVM clas-
sifiers for SA. However, they are unable to ascertain the sentiment
correctly from sarcastic tweets.