Page 179 - Building Big Data Applications
P. 179
178 Building Big Data Applications
file analysis, requirements process are all included and their outcomes of the processes
from the prior steps will be added to the next section.
Designing the big data application is a multistep process. We need to do the following
in the design:
The success or failure of a big data project revolves around employees’ ability to tinker
with information. One challenge is translating a large volume of complex data into
simple, actionable business information. “The designer of the application needs to be
sure that the application algorithms are sound and that the system is easy to use”.
In the design process, architects and developers will work with scientists to fine-
tune complex mathematical formulas. In the foreground is a user, who consumes
the outcomes and the insights, which means the application has to filter the data
and present it in an easy-to-follow manner so they can probe further. This is a risk
area where many a times we fail and do not understand why.
We need to include user interface designers as key members of the big data applica-
tion development team. These team members are experts at understanding how end
users interact with information and therefore help the design of the interfaces
declassifying potential clutter, and present sleek and meaningful interfaces to users.
Formulas and transformations that have been evolved in the research and analysis
segments will need to be documented, and the architects and designers will need
to include them into the algorithms and other processes.
Errors and abends need to be managed in the design process Errors within the
application process need separate messages and other errors including data access,
storage, and compute errors need separate messages. In any state of error, the re-
covery from the point of failure should be thought in the design stage. This is
really efficient when implemented in the development process as microservices.
Storage is another area that impacts performance. As datasets become larger, the
challenge to process also increases. The current design on the database may parti-
tion data, separating older or “almost stale” data from newer information. In the
big data infrastructure, the better option is to segment directories divided by
ranges of dates or months. The data can be migrated in a weekly maintenance
mode and space can be managed with compression and really old data can be
archived with metadata as needed.
Data quality and cleansing takes large time and effort in the database world; this
issue is easily solved in the big data world. We can build applications near the raw
data layer and let user analyze the data in its raw and dirty layer. This will provide
us meaningful data cleansing and other requirements including obfuscation, mask-
ing and what data needs to be provided to the end users. The biggest benefit is the
applications will perform and deliver insights as designed.
Metadata, semantic data, and taxonomies will need to be added to the architecture
in the design phase. The taxonomies and the rules to process data need to be