Page 138 - Data Architecture

P. 138

Chapter 4.3: Parallel Processing
Chapter 4.3

Parallel Processing

Abstract

There are different definitions of big data. The definition used here is that big data
encompasses a lot of data, is based on inexpensive storage, manages data by the “Roman
census” method, and stores data in an unstructured format. There are two major types of
big data—repetitive big data and nonrepetitive big data. Only a small fraction of

repetitive big data has business value, whereas almost all of nonrepetitive big data has
business value. In order to achieve business value, the context of data in big data must be
determined. Contextualization of repetitive big data is easily achieved. But
contextualization of nonrepetitive data is done by means of textual disambiguation.

Keywords

Big data; Roman census method; Unstructured data; Repetitive data; Nonrepetitive data;
Contextualization; Textual disambiguation

The very essence of big data is the ability to handle very large volumes of data. Fig. 4.3.1
symbolically depicts a lot of data.

138

133 134 135 136 137 138 139 140 141 142 143