Page 165 - Data Architecture
P. 165

Chapter 4.6: Textual Disambiguation
           Chapter 4.6



           Textual Disambiguation



           Abstract



           There are different definitions of big data. The definition used here is that big data
           encompasses a lot of data, is based on inexpensive storage, manages data by the “Roman
           census” method, and stores data in an unstructured format. There are two major types of
           big data—repetitive big data and nonrepetitive big data. Only a small fraction of

           repetitive big data has business value, whereas almost all of nonrepetitive big data has
           business value. In order to achieve business value, the context of data in big data must be
           determined. Contextualization of repetitive big data is easily achieved. But
           contextualization of nonrepetitive data is done by means of textual disambiguation.


           Keywords



           Big data; Roman census method; Unstructured data; Repetitive data; Nonrepetitive data;
           Contextualization; Textual disambiguation


           The process of contextualizing nonrepetitive unstructured data is accomplished by
           technology known as “textual disambiguation” (or “textual ETL”). The process of textual
           disambiguation has an analogous process in structured processing known as
           “ETL”—“extract/transform/load.” The difference between ETL and textual ETL is that
           ETL transforms old legacy system data and textual ETL transforms text. At a very high
           level, they are analogous, but in terms of the actual details of processing, they are very
           different.



           From Narrative Into an Analytical Data Base



           The purpose of textual disambiguation is to read raw text—narrative—and to turn that
           text into an analytic database. Fig. 4.6.1 shows the general flow of data in textual
           disambiguation.








                                                                                                               165
   160   161   162   163   164   165   166   167   168   169   170