Page 150 - Handbook of Deep Learning in Biomedical Engineering Techniques and Applications
P. 150

Chapter 5 Depression discovery in cancer communities using deep learning  139




               these tokens is carried out and represented as T i , having a con-
               stant size (X), and any sequence of size more than X is abridged.
               Any text is encoded by a word lexicon of constant terms W. We
               use three constants to indicate the start, termination, and out of
               lexicon terms of a text. To normalize the text to a uniform length,
               padding and truncations are used. Padding is used in short text to
               elongate them to a standard length, and truncation for long
               sequence text is being done. For word hot encoding, 100,000 stop-
               words are to be selected with context window size to 5, and the
               occurrence of words is to be set to 2.
               (c) Word embedding models
                  The idea of word embedding is to capture the semantics/
               morphological/context/hierarchical/information from the given
               text as much as possible and convert it to vector form. The
               word embedding algorithm uses a large amount of text to create
               high-dimensional representations of words that capture the rela-
               tionships between words with much linguistic regularity with
               such representations.
                  In this research, we use word embedding models to represent
               words in a language as vectors. They were formed to reduce the
               dimensionality of text data, but these models can also learn
               some remarkable qualities about words. We create Word2Vec
               [78] word embedding model that generates the vocabulary for
               each word carrying the similar contextual words together either
               using the pretrained words provided by Google or using the exist-
               ing text. The generated vocabulary is denser vector representation
               that is fed into the deep learning model as an embedded word ma-
               trix for the training purpose of the model. The three models are
               discussed in this literature that can be used with deep learning
               for the depression prediction: skip-gram, continuous bag of words
               (CBOW), and optimized model for the creation of denser vector
               representation.


               3.1Continuous bag of words
                  In CBOW, the model foresees the current word from a window
               of surrounding context words depending upon the given size of
               the window, those context words are passed to the embedding
               layer by initializing the random weights, and after then, it is
               passed to the averaging layer. In averaging layer, the words are
               given to the softmax function to produce the target words
               without considering the arrangement of context words as shown
               in Fig. 5.2. The ordering of the words does not affect the prediction
               of the targeted word.
   145   146   147   148   149   150   151   152   153   154   155