Page 176 - Data Architecture
P. 176

Chapter 4.6: Textual Disambiguation

























               Fig. 4.6.11 Reformatting spreadsheet data.



           Report Decompilation



           Most textual information is found in the form of a document. And when text is on a
           document, it is processed linearly by textual disambiguation. Fig. 4.6.12 shows that
           textual disambiguation operates in a linear fashion.








               Fig. 4.6.12 Linear processing of text.


           But text on a document is not the only form of nonrepetitive unstructured data. Another
           common form of nonrepetitive unstructured data is that of a table. Tables are found
           everywhere—in bank statements, in research papers, in corporate invoices, and so forth.


           On some occasions, it is necessary to read the table in as input, just as text is read in on a
           document. To this end, a specialized form of textual disambiguation is required. This form
           of textual disambiguation is called report decomposition.


           In report decomposition, the contents of the report are handled very differently than the
           contents of text. The reason why reports are handled differently from text is that in a
           report, the information cannot be handled in a linear format.


           Fig. 4.6.13 shows that there are different elements of a report that must be brought
           together in a normalized format. The problem is that those elements appear is a decidedly
                                                                                                               176
   171   172   173   174   175   176   177   178   179   180   181