Page 179 - Building Big Data Applications
P. 179

178   Building Big Data Applications


             file analysis, requirements process are all included and their outcomes of the processes
             from the prior steps will be added to the next section.
                Designing the big data application is a multistep process. We need to do the following
             in the design:

               The success or failure of a big data project revolves around employees’ ability to tinker
                with information. One challenge is translating a large volume of complex data into
                simple, actionable business information. “The designer of the application needs to be
                sure that the application algorithms are sound and that the system is easy to use”.
               In the design process, architects and developers will work with scientists to fine-
                tune complex mathematical formulas. In the foreground is a user, who consumes
                the outcomes and the insights, which means the application has to filter the data
                and present it in an easy-to-follow manner so they can probe further. This is a risk
                area where many a times we fail and do not understand why.
               We need to include user interface designers as key members of the big data applica-
                tion development team. These team members are experts at understanding how end
                users interact with information and therefore help the design of the interfaces
                declassifying potential clutter, and present sleek and meaningful interfaces to users.
               Formulas and transformations that have been evolved in the research and analysis
                segments will need to be documented, and the architects and designers will need
                to include them into the algorithms and other processes.
               Errors and abends need to be managed in the design process Errors within the
                application process need separate messages and other errors including data access,
                storage, and compute errors need separate messages. In any state of error, the re-
                covery from the point of failure should be thought in the design stage. This is
                really efficient when implemented in the development process as microservices.
               Storage is another area that impacts performance. As datasets become larger, the
                challenge to process also increases. The current design on the database may parti-
                tion data, separating older or “almost stale” data from newer information. In the
                big data infrastructure, the better option is to segment directories divided by
                ranges of dates or months. The data can be migrated in a weekly maintenance
                mode and space can be managed with compression and really old data can be
                archived with metadata as needed.
               Data quality and cleansing takes large time and effort in the database world; this
                issue is easily solved in the big data world. We can build applications near the raw
                data layer and let user analyze the data in its raw and dirty layer. This will provide
                us meaningful data cleansing and other requirements including obfuscation, mask-
                ing and what data needs to be provided to the end users. The biggest benefit is the
                applications will perform and deliver insights as designed.
               Metadata, semantic data, and taxonomies will need to be added to the architecture
                in the design phase. The taxonomies and the rules to process data need to be
   174   175   176   177   178   179   180   181   182   183   184