Page 152 - Handbook of Deep Learning in Biomedical Engineering Techniques and Applications
P. 152
Chapter 5 Depression discovery in cancer communities using deep learning 141
product of both types of input label is computed and passed to the
sigmoid function embedding layers, which is then passed to the
sigmoid function to produce the output. If the produced output
does not match with the target output, then the error is backpro-
pagated in the layers. The models are pretrained on the CLPsych
2015 Shared task data.
3.1.2 Word embedding optimization
On top of the word embedding models, the optimization is
applied to enhance the performance of the model that learns a
more accurate feature representation for depression detection.
For optimization in the proposed model, the averaging of word
embeddings is performed to derive more accurate feature repre-
sentation [79].
We train at the sentence level and predict the surrounding
sentences [80], as well as their possible sense out of depressed,
PTSD, or neither. We use multitask deep learning (MTL) [81] for
this task. This is represented pictorially in Fig. 5.4 wherein the
shared layer is positioned inside a dashed box.
We need to train both word and sense predictions. We prepare
the input word embedding using a pretrained skip-gram. For the
first task, we use supervised training to predict pairs of words
appearing in close proximity with one another. For the next
task, we use rectified liner unit (ReLU) activation that gives a
last output layer. It includes a label for omitted data, as it is likely
to have incomplete supervised data about senses. Moreover, we
use regularized l2-norm loss to limit shared layers among the
tasks. We use antirectifier for aiding an all-positive output
without mislaying any value. We propose the use of the cosine
distance metric for calculating the similarities among word repre-
sentations and for producing word probability distributions.
Figure 5.4 Word embedding optimization.