Page 154 - Data Architecture
P. 154

Chapter 4.4: Unstructured Data
           unstructured records are the following:


               Very nonuniform in shape.
               Sometimes small, sometimes large, and sometimes very large.
               The records are quite difficult to parse because the records are made up of text and text requires an
               entirely different approach than simple parsing.


           There are probably more differences between these two types of data. But these
           differences alone warrant the recognition of the “great divide” between the types of
           unstructured data.


           So, what is so difficult about going in and working with text? Fig. 4.4.6 shows some
           typical text.





































               Fig. 4.4.6 Some typical text.


           There are many reasons why text is so difficult to work with.


           First off, there is the discussion of whether text is actually unstructured at all. An English
           teacher might argue that text is anything but unstructured. There are rules that govern the
           structure of all text. Some of the rules include the following:


               Spelling

                                                                                                               154
   149   150   151   152   153   154   155   156   157   158   159