Page 12 - Building Big Data Applications
P. 12
6 Building Big Data Applications
media, forums, and hosted sites (for example, WebMD) along with machine data. In
healthcare, there are three characteristics of Big Data:
1. Volume: The data sizes are varied and range from megabytes to multiple terabytes
2. Velocity: The data production by machines, doctors’ notes, nurses’ notes, and clin-
ical trials are all produced at different speeds and are highly unpredictable
3. Variety: The data is available or produced in a variety of formats but not all for-
mats are based on similar standards
Over the past 5 years, there have been a number of technology innovations to handle
Web 2.0-based data environments, including Hadoop, NoSQL, data warehouse appli-
ances (iteration 3.0 and more), and columnar databases. There are several analytical
models that have become available and late last year the Apache Software Foundation
released a collection of statistical algorithms called Mahout. With so many innovations,
the potential is there to create a powerful information processing architecture that will
address multiple issues that face data processing in healthcare today:
Solving complexity
Reducing latencies
Agile analytics
Scalable and available systems
Usefulness (getting the right information to the right resource at the right time)
Improving collaboration
Potential solutions
How can Big Data solutions fix healthcare? A prototype solution flow is shown here.
While this is not a complete production system flow, there are several organizations
working on such models in small and large environments (Fig. 1.1).
An integrated system can intelligently harness different types of data using archi-
tectures like those of Facebook or Amazon to create a scalable solution. Using a textual
processing engine like FRT Textual ETL (extract, transform, load) enables small and
medium enterprises to write business rules in English. The textual data, images, and
video data can be processed using any of the open source foundation tools. Data output
from all these integrated processors will produce a rich data set and also generate an
enriched column-value pair output. We can use the output along with existing enterprise
data warehouse (EDW) and analytical platforms to create a strong set of models utilizing
analytical tools and leveraging Mahout algorithms.
Using metadata-based integration of data and creating different types of
solutionsdincluding evidence-based statistics, clinical trial versus clinical diagnosis
types of insights, patient dashboards for disease state management based on machine
output and so ondlets us generate information that is rich, auditable, and reliable. This
information can be used to provide better care, reduce errors, and create more confi-
dence in sharing data with physicians in a social media outlet, thus providing more