Page 19 - Building Big Data Applications
P. 19
Chapter 1 Big Data introduction 13
1. Acquire data from all sources. These sources include automobiles, devices, ma-
chines, mobile devices, networks, sensors, wearable devices, and anything that pro-
duces data.
2. Ingest all the acquired data into a data swamp. The key to the ingestion process
is to tag the source of the data. Streaming data that needs to be ingested can be
processed as streaming data and can also be saved as files. Ingestion also includes
sensor and machine data.
3. Discover data and perform initial analysis. This process requires tagging and clas-
sifying the data based on its source, attributes, significance and need for analytics,
and visualization.
4. Create a data lake after data discovery is complete. This process involves extract-
ing the data from the swamp and enriching it with metadata, semantic data, and
taxonomy and adding more quality to it as is feasible. This data is then ready to be
used for operational analytics.
5. Create data hubs for analytics. This step can enrich the data with master data and
other reference data, creating an ecosystem to integrate this data into the database,
enterprise data warehouse, and analytical systems. The data at this stage is ready
for deep analytics and visualization.
The key to note here is that steps 3, 4, and 5 are all helping in creating data lineage,
data readiness with enrichment at each stage and a data availability index for usage.
Critical factors for success
While the steps for processing data are similar to what we do in the world of Big Data, the
data here can be big, small, wide, fat, or thin and it can be ingested and qualified for
usage. Several critical success factors will result from this journey:
Data: You need to acquire, ingest, collect, discover, analyze and implement ana-
lytics on the data. This data needs to be defined and governed across the process.
And you need to be able to handle more volume, velocity, variety, formats, avail-
ability, and ambiguity problems with data.
Business Goals: The most critical success factor is defining business goals. Without
the right goals, the data is neither useful, nor are the analytics and outcomes from
the data useful.
Sponsors: Executive sponsorship is needed for the new age of innovation to be
successful. If no sponsorship is available, then the analytical outcomes, the lineage
and linking of data, and the associated dashboards are all not happening and will
be a pipe dream.
Subject Matter Experts: The people and teams who are experts in the subject mat-
ter are needed to be involved in the Internet of Things journey; they are key to the
success of the data analytics and using that analysis.