Page 106 - Building Big Data Applications
P. 106
102 Building Big Data Applications
The complexity design for data applications
A new methodology can be incorporated for understanding complexity and designing
the architecture to handle the same. The methodology is a combination of steps
migrated from traditional techniques of data processing along with new steps. Here is a
view of the methodology.
The steps for managing complexity are shown in Fig. 5.2. These steps can be phased
and sequenced depending on the infrastructure and the type of data we need to work
with. In the world of big data, the complexity of the data is associated with its formats
and the speed of the production of data. Let us start with understanding the sequencing
of these steps; the data is streaming data from either an airplane or a critical patient,
which comes with several metadata elements embedded in the data as it originates from
the source. We need to understand how to deal with this data, learn from the discoveries
the aspects that it portrays for use in compute and insight delivery.
The first step in this process to understand the myriad complexity with the data we
receive. To understand the data there are two ways, one is to collect sample data sets and
understand the data, the second is to stream the data and setup algorithms to under-
stand the data. The second technique is what big data algorithms will help us facilitate.
The big data algorithms that manage and process streaming data are designed and
implemented to process, tag, and analyze large and constantly moving volumes of data.
There are many different ways to accomplish this, each with its own advantages and
disadvantages. One approach is the native stream processing also called as tuple-at-a-
time processing. In this technique, every event is processed as it comes in, one after
the other, resulting in the lowest-possible latency. Unfortunately, processing every
incoming event is also computationally expensive, especially if we are doing this in-
memory and the data is complex by nature. The other technique is called the micro-
batch processing, and in this approach, we make the opposite tradeoff, dividing
Data Acquisi on
Data Discovery
Data Explora on
Data A ribu on & Defini on
Data Analysis
Data Tagging
Data Segmenta on & Classifica on
Data Storyboarding
Data Lake Design
Data Hub DEsign
Analy cs
FIGURE 5.2 Data complexity methodology.