Page 12 - Building Big Data Applications
P. 12

6 Building Big Data Applications


             media, forums, and hosted sites (for example, WebMD) along with machine data. In
             healthcare, there are three characteristics of Big Data:
             1. Volume: The data sizes are varied and range from megabytes to multiple terabytes
             2. Velocity: The data production by machines, doctors’ notes, nurses’ notes, and clin-
                ical trials are all produced at different speeds and are highly unpredictable
             3. Variety: The data is available or produced in a variety of formats but not all for-
                mats are based on similar standards

                Over the past 5 years, there have been a number of technology innovations to handle
             Web 2.0-based data environments, including Hadoop, NoSQL, data warehouse appli-
             ances (iteration 3.0 and more), and columnar databases. There are several analytical
             models that have become available and late last year the Apache Software Foundation
             released a collection of statistical algorithms called Mahout. With so many innovations,
             the potential is there to create a powerful information processing architecture that will
             address multiple issues that face data processing in healthcare today:

               Solving complexity
               Reducing latencies
               Agile analytics
               Scalable and available systems
               Usefulness (getting the right information to the right resource at the right time)
               Improving collaboration

             Potential solutions
             How can Big Data solutions fix healthcare? A prototype solution flow is shown here.
             While this is not a complete production system flow, there are several organizations
             working on such models in small and large environments (Fig. 1.1).
                An integrated system can intelligently harness different types of data using archi-
             tectures like those of Facebook or Amazon to create a scalable solution. Using a textual
             processing engine like FRT Textual ETL (extract, transform, load) enables small and
             medium enterprises to write business rules in English. The textual data, images, and
             video data can be processed using any of the open source foundation tools. Data output
             from all these integrated processors will produce a rich data set and also generate an
             enriched column-value pair output. We can use the output along with existing enterprise
             data warehouse (EDW) and analytical platforms to create a strong set of models utilizing
             analytical tools and leveraging Mahout algorithms.
                Using metadata-based integration of data and creating different types of
             solutionsdincluding evidence-based statistics, clinical trial versus clinical diagnosis
             types of insights, patient dashboards for disease state management based on machine
             output and so ondlets us generate information that is rich, auditable, and reliable. This
             information can be used to provide better care, reduce errors, and create more confi-
             dence in sharing data with physicians in a social media outlet, thus providing more
   7   8   9   10   11   12   13   14   15   16   17