Page 179 - Data Architecture
P. 179

Chapter 4.7: Taxonomies
           Chapter 4.7



           Taxonomies



           Abstract



           There are different definitions of big data. The definition used here is that big data
           encompasses a lot of data, is based on inexpensive storage, manages data by the “Roman
           census” method, and stores data in an unstructured format. There are two major types of
           big data—repetitive big data and nonrepetitive big data. Only a small fraction of

           repetitive big data has business value, whereas almost all of nonrepetitive big data has
           business value. In order to achieve business value, the context of data in big data must be
           determined. Contextualization of repetitive big data is easily achieved. But
           contextualization of nonrepetitive data is done by means of textual disambiguation.


           Keywords



           Big data; Roman census method; Unstructured data; Repetitive data; Nonrepetitive data;
           Contextualization; Textual disambiguation


           Taxonomies are classifications of information. Taxonomies play a large and important
           role in the disambiguation of narrative information. Fig. 4.7.1 shows that taxonomies are
           to unstructured data what the data model is to structured data.






























                                                                                                               179
   174   175   176   177   178   179   180   181   182   183   184