Page 150 - Handbook of Deep Learning in Biomedical Engineering Techniques and Applications
P. 150
Chapter 5 Depression discovery in cancer communities using deep learning 139
these tokens is carried out and represented as T i , having a con-
stant size (X), and any sequence of size more than X is abridged.
Any text is encoded by a word lexicon of constant terms W. We
use three constants to indicate the start, termination, and out of
lexicon terms of a text. To normalize the text to a uniform length,
padding and truncations are used. Padding is used in short text to
elongate them to a standard length, and truncation for long
sequence text is being done. For word hot encoding, 100,000 stop-
words are to be selected with context window size to 5, and the
occurrence of words is to be set to 2.
(c) Word embedding models
The idea of word embedding is to capture the semantics/
morphological/context/hierarchical/information from the given
text as much as possible and convert it to vector form. The
word embedding algorithm uses a large amount of text to create
high-dimensional representations of words that capture the rela-
tionships between words with much linguistic regularity with
such representations.
In this research, we use word embedding models to represent
words in a language as vectors. They were formed to reduce the
dimensionality of text data, but these models can also learn
some remarkable qualities about words. We create Word2Vec
[78] word embedding model that generates the vocabulary for
each word carrying the similar contextual words together either
using the pretrained words provided by Google or using the exist-
ing text. The generated vocabulary is denser vector representation
that is fed into the deep learning model as an embedded word ma-
trix for the training purpose of the model. The three models are
discussed in this literature that can be used with deep learning
for the depression prediction: skip-gram, continuous bag of words
(CBOW), and optimized model for the creation of denser vector
representation.
3.1Continuous bag of words
In CBOW, the model foresees the current word from a window
of surrounding context words depending upon the given size of
the window, those context words are passed to the embedding
layer by initializing the random weights, and after then, it is
passed to the averaging layer. In averaging layer, the words are
given to the softmax function to produce the target words
without considering the arrangement of context words as shown
in Fig. 5.2. The ordering of the words does not affect the prediction
of the targeted word.