Page 103 - Building Big Data Applications

P. 103

Pharmacy industry applications and

usage

Torture the data, and it will confess to anything.
Ronald Coase, winner of the Nobel Prize in Economics

Pharmaceuticals run extremely complex mathematical analytics compute across all their
processes. The interesting viewpoint here is the fact that they are dwelling in a world of
data complexity, which needs to be understood across different layers with the appro-
priate blending of insights. The incorrect application of formulas and calculations will
only misguide us to incorrect conclusions. Data is something that you have to manip-
ulate to get at key truths. How you decide to treat your data can vastly affect what
conclusions you make.
In the use case analysis in this chapter, we will discuss the implementation of Hadoop
by Novartis, and their approach to overcome challenges faced in the traditional data
worlds for complexity of compute and multi-user access of the underlying data across the
same time for other analytics and reporting. We will discuss the facets of big data appli-
cations looking into accessing streaming data sets, data computations using in-memory
architectures, distributed data processing for creating data lakes and analytical hubs, in-
process visualizations and decision support, the data science team and how the change
needs to happen for successful creation of big data applications. We will discuss the usage
and compliance requirements for the data, the security, encryption, storage, compression,
and retention speciﬁc topics as related to pharmaceutical industry.
Complexity is a very sensitive subject in the world of data and analytics. By deﬁnition
it deals with processes that are interconnected and have dependencies that may be
visible or hidden, often leading to chaos in the processing of the data. These systems
have exhibited characteristics including but not limited to the following:

The number of parts (and types of parts) in the system and the number of re-
lations between the parts is nontrivial. There is no general rule to separate “trivial”
from “nontrivial”, and it depends on the owner of the data to deﬁne these rules
and document them with ﬂows and dependencies. This issue is essential to deal
with as large systems can get complex and become less used, which is both cost
and productivity loss.
The system has memory or includes feedback and loops which are needed to be
deﬁned and exit strategies need to be validated for each operation. Scientiﬁc
compute falls into this category and there are several case studies of experiments

98 99 100 101 102 103 104 105 106 107 108