Page 61 - Big Data Analytics for Intelligent Healthcare Management
P. 61
3.6 ARCHITECTURAL FRAMEWORK AND DIFFERENT TOOLS 53
each server/node. The main drawback of MapReduce is that it is not suitable for Interactive
Processing.
(c) Hive:
Hive is a runtime Hadoop support architecture. It controls Structure Query Language (SQL)
with the Hadoop platform. It is mainly used to analyze structured and semistructured data. It allows
SQL programmers to develop Hive Query Language (HQL) statements akin to typical SQL
statements [35]. Its fastness is its first advantage. But some of the major drawbacks include no real-
time access to data and a complicated system of updating data.
(d) Pig and PigLatin:
Pig programming language is built up to incorporate all types of data (structured/unstructured).
It has two prime modules: PigLatin (i.e., the language), and the runtime version where the PigLatin
code is executed [35]. Although the biggest advantage of Pig is the short development time, it also
has the disadvantage of handling errors, for example, when there is a case of simple syntax error, it
will show “exec error”.
(e) Zookeeper:
ZooKeeper is a centralized service for preserving configuration information, naming,
providing distributed synchronization, and group services [35]. Synchronization is provided across
a cluster of servers that are utilized by big data analytics to coordinate parallel processing across
big clusters. Its high scalability is its prime advantage. Limited support for cross-cluster scenarios
is one of the major drawbacks of ZooKeeper.
(f) Jaql:
This is a functional data processing and declarative query language used for JSON query
processing on BigData. One of its prime tasks is to convert “‘high-level’ queries into ‘low-level’
queries” comprising of MapReduce tasks and in this way, it supports parallel processing. The
problem of error handling is a major drawback.
(g) HBase:
HBase is an open-source, nonrelational, distributed database model. This can be defined as a
column-oriented database management system that sits on top of HDFS and uses a nonSQL
approach. Linear and modular scalability is its prime advantage. No exception handling is one
major disadvantage of HBase.
(h) Cassandra:
Apache Cassandra is a free and open-source, widely distributed column store NoSQL
database management system [35]. It is specially designed to handle big amounts of data across
many commodity servers, providing high accessibility with no single point of failure. No single
point of failure is its first advantage. The disadvantages include no ad-hoc queries and no
aggregations.
(i) Lucene:
This is mainly text analytics/searches. Lucene has already been integrated into several
open-source projects. Its scope comprises full-text indexing and library search for use within a
Java application. The first advantage is that Lucene is available for free as open source under
the liberal Apache Software license. Also, speed and high-performance indexing is another
advantage. Efficient and accurate search algorithms are implemented properly which in
turn makes it possible to perform a most accurate search. So, these are some distinct features
of Lucene.