Page 155 - Data Architecture
P. 155

Chapter 4.4: Unstructured Data
               Punctuation
               Grammar
               Proper sentence construction


           It cannot be argued that there are no rules that govern the creation of proper text. But
           those rules are so complex that the rules are not obvious and apparent to the computer.
           From the computer's perspective, text is unstructured simply because the computer
           cannot understand all the rules of proper textual construction.



           Contextualization



           There are many parts of text that must be managed if text is to be turned into a form that
           is useful to the computer. But easily, the most important and the most complex aspect of
           text that must be mastered is that of finding and determining the context of text. Stated
           differently, if you do not understand the context of text, you cannot use text for any form
           of useful decision-making.


           Contextualization of text then is the single largest challenge facing the analyst who
           wishes to use nonrepetitive unstructured text in the decision-making process.


           Fig. 4.4.7 shows an example of the importance of understanding context.































               Fig. 4.4.7 Text makes no sense without understanding context.


           Two gentlemen are standing on a corner, and one gentleman says to the next as a young

                                                                                                               155
   150   151   152   153   154   155   156   157   158   159   160