Page 62 - Big Data Analytics for Intelligent Healthcare Management
P. 62
54 CHAPTER 3 BIG DATA ANALYTICS IN HEALTHCARE: A CRITICAL ANALYSIS
(j) Oozie:
Apache Oozie is a server-based workflow scheduling system to handle Hadoop jobs.
Here, workflows are described as a group of control flow and action nodes in a directed acyclic graph.
The main benefit of Oozie is its capacity to launch workflows that will grow as the cluster grows.
(k) Mahout:
Apache Mahout is a project of the Apache Software Foundation to create free implementations
of distributed or otherwise scalable machine learning algorithms emphasized principally in
the areas of collaborative filtering, clustering, and classification that support big data analytics on
the Hadoop platform.
(l) Avro:
Avro is a data serialization system. It has some additional features such as versioning
and version. The smallest size is its one of the greatest benefits of using Avro. The disadvantages
include that it is a type of schema-based system and needs a schema for reading/writing data.
3.7 CHALLENGES FACED DURING BIG DATA ANALYTICS IN HEALTHCARE
The major challenges that are faced during big data analytics in healthcare are discussed below:
(a) The first challenge faced during big data analytics is that the healthcare data are not in a
standardized format, often discovered in fragmented form, or in some incompatible formats
[35]. So, it is suggested that healthcare systems and data should be standardized before proceeding
for further processing.
(b) The second major challenge faced is the real-time processing issue. Real-time big data analytics
is an important requirement in the healthcare industry [36]. To address this issue, the delay between
data acquisition and data processing should be dealt with quickly.
(c) The possible time effect is another big challenge. It may happen to occur that the results of big data
analytics may differ from time to time. The reasons behind this may be due to change in technology
or adaptation of high-end technology, and genetic changes that might occur from time to time in the
patient population [37].
(d) The adaptation of cloud technology in the healthcare industry is progressing rapidly. For
example, during the diagnosis process for a patient, the expert may need to access the
electronic medical record (EMR), which contains huge multimedia big data including X-rays,
ultrasounds, CT scans, and MRI reports. For easy accessibility, the EMR is often stored in
the cloud. But the question is how secure is this sensitive data in the cloud? This is an emerging
challenge that needs to be addressed. Emphasis should be placed on some security models
that may be implemented during clinical data sharing or any healthcare-based data sharing
over the cloud. Also, the optimality of these models in real-time data sharing should be
properly tested and verified. In literature, we find a number of good models that may be
adopted to address this issue. The selection of a cloud service provider should be done very
carefully, for which trust values calculated through history-based reputation rating might
be adapted [38]. Also, a fog computing facility with pairing-based cryptography may
also be beneficial while maintaining the privacy of those sensitive healthcare data in the
cloud [39].