Page 152 - Handbook of Deep Learning in Biomedical Engineering Techniques and Applications
P. 152

Chapter 5 Depression discovery in cancer communities using deep learning  141




               product of both types of input label is computed and passed to the
               sigmoid function embedding layers, which is then passed to the
               sigmoid function to produce the output. If the produced output
               does not match with the target output, then the error is backpro-
               pagated in the layers. The models are pretrained on the CLPsych
               2015 Shared task data.
               3.1.2 Word embedding optimization
                  On top of the word embedding models, the optimization is
               applied to enhance the performance of the model that learns a
               more accurate feature representation for depression detection.
               For optimization in the proposed model, the averaging of word
               embeddings is performed to derive more accurate feature repre-
               sentation [79].
                  We train at the sentence level and predict the surrounding
               sentences [80], as well as their possible sense out of depressed,
               PTSD, or neither. We use multitask deep learning (MTL) [81] for
               this task. This is represented pictorially in Fig. 5.4 wherein the
               shared layer is positioned inside a dashed box.
                  We need to train both word and sense predictions. We prepare
               the input word embedding using a pretrained skip-gram. For the
               first task, we use supervised training to predict pairs of words
               appearing in close proximity with one another. For the next
               task, we use rectified liner unit (ReLU) activation that gives a
               last output layer. It includes a label for omitted data, as it is likely
               to have incomplete supervised data about senses. Moreover, we
               use regularized l2-norm loss to limit shared layers among the
               tasks. We use antirectifier for aiding an all-positive output
               without mislaying any value. We propose the use of the cosine
               distance metric for calculating the similarities among word repre-
               sentations and for producing word probability distributions.



















                                          Figure 5.4 Word embedding optimization.
   147   148   149   150   151   152   153   154   155   156   157