Page 139 - Data Architecture

P. 139

Chapter 4.3: Parallel Processing
Fig. 4.3.1 A lot of data.

There are so much data that need to be handled by big data that trying to load, access,
and manipulate the data is a real challenge. It is safe to say that no computer is capable of
handling all the data that can be accumulated in the big data environment.

The only possible strategy is to use multiple processors to handle the volume of data
found in big data. In order to understand why it is mandatory to use multiple processors,
consider the (old) story about the farmer that drives his crop to the marketplace in a
wagon. When the farmer is first starting out, he doesn’t have much of a crop. He uses a

donkey to pull the wagon. But as the years pass by, the farmer raises bigger crops. Soon,
he needs a bigger wagon. And he needs a horse to pull the wagon. Then, one day, the
crop that is put in the wagon becomes immense, and the farmer doesn’t just need a horse.
The farmer needs a large Clydesdale horse.

Time passes, and the farmer prospers even more, and the crop continues to grow. One
day, even a Clydesdale horse is not large enough to pull the wagon. The day comes where
multiple horses are required to pull the wagon. Now, the farmer has a whole new set of
problems. A new rigging is required. A trained driver is required to coordinate the team of
horses that pull the wagon.

The same phenomenon occurs where there are lots of data. Multiple processors are
required to load and manipulate the volumes of data found in big data.

In a previous chapter, there was a discussion of the “Roman census” method. The Roman
census method is one of the ways in which parallelization of processing for the
management of large amounts of data can occur.

Fig. 4.3.2 depicts the parallelization that occurs in the Roman census approach.

139

134 135 136 137 138 139 140 141 142 143 144