Page 157 - Data Architecture
P. 157

Chapter 4.4: Unstructured Data
           earliest attempt to trying to contextualize text is a technology called “NLP.” NLP stands
           for natural language processing (or sometimes “natural language programming.”)


           NLP has been around a long time and has met with modest success. There are several
           inherent limitations to NLP. The first limitation is that NLP makes the assumption that
           context of text can be derived from text itself. The problem is that only a small amount of
           context comes from text itself. In the case of the two gentlemen standing around and

           saying—“She's hot”—the vast majority of the context comes from external sources, not
           textual sources. Is the lady young and attractive? Is it Houston, Texas, in the
           summertime? Is the conversation taking place in a hospital? All of these circumstances
           that provide context are external to the words that are being spoken.


           The second limitation of NLP is that NLP does not account for emphasis. Suppose the
           words are spoken—“I love you.” How are these words to be interpreted?


           If you say “I love you” where the emphasis is on “I,” the meaning is that it is me and not
           someone else who loves you. If the emphasis is on the word “love,” the meaning is that
           the emotion I feel is strong, one of love. I don’t like you—I actually love you. If the
           emphasis is on the word “you,” the meaning is that it is you and not someone else that I
           love.


           So, the same words can have very different meaning based on the way the words are said.


           But there is a very different reason why NLP has had a hard time showing concrete
           results. That reason is that NLP—in order to be implemented effectively—must
           understand the logic behind words. The problem is that the English language has evolved
           over many years and many circumstances, and at the end of the day, the logic behind the
           English language is very complex. Trying to map out the logic of the English language is
           very difficult to do. It is tortuous.


           For these reasons (and probably more), NLP processing has met with modest success.


           A much more practical approach is that of textual disambiguation.


           Fig. 4.4.9 shows the two approaches toward contextualization of text.










                                                                                                               157
   152   153   154   155   156   157   158   159   160   161   162