Page 200 - Building Big Data Applications
P. 200
200 Building Big Data Applications
availability wasalwayshavinglatenciesbuilt-in and,inmostcases,causedbusinessteamsto
walk away from a central repository to create their own solutions. This problem has led to
silos of data, which I can think of as islands, where the foundation looks similar, but the
structure and outcomes are completely different. These systems became legacy over period
of time, and several have been discarded and forgotten as we have evolved better tools and
access paths. These islands are strewn around all corporations and government agencies,
and the issue which concerns us is the cybersecurity attack with hackers trying to penetrate
systems on demand and breaching data successfully (Fig. 11.2).
With the passage of time and the evolution of self-service BI tools including
Microsoft’s Power BI, Tableau and Qlik, people in the business teams have become self-
driven to perform analysis work. That motivation has extended a desire to introduce
their own data sets and perform similar analysis on them, which have created newer
segments of these islands, the only difference being these can be interconnected, and
that creates a bigger problem. The issue at heart being how many copies of the same data
is prevalent? Who owns them? Who governs them? Who assigns metadata? What is their
lifecycle? Are they being removed according to compliance rules? Who audits these silos
for assurance of clean data? (Fig. 11.3).
This increased usage patterns of data and analytics has led to growth in volumes and
the variety of data sources, which in turn, has led to increased computing and storage
requirements and driven increases in cost. The new demands and associated cost in-
creases have drained the legacy data warehouse systems in meeting the required
FIGURE 11.2 The islands.