Page 356 - From Smart Grid to Internet of Energy
P. 356
320 From smart grid to internet of energy
assessment section is responsible for providing data quality service for users and
applications, and supplies metadata on demand [16].
There are two scaling approaches that are vertical and horizontal are used to
determine various structure of big data. The vertical scaling defines processing
devices such as memory, processing units, and other computing devices that
are based on single operating system. Therefore, entire workload can be shared
with several processing units and parallel computing devices. The horizontal
scaling is based on increased number of computing nodes that can be extended
as required. The main drawback of horizontal scaling is its use of multiple
operating systems due to parallel computing infrastructure. The two featured
platforms that are used in vertical and horizontal scaling environments are high
performance computing clusters and Apache Hadoop [17].
8.3.1 Data mining methods
It is clear that distributed computing methods can cope with massive databases
and can handle data analytics on overloaded data stacks. The distributed com-
puting architecture provides high capabilities in terms of storage and processing
speed. In addition to resource contribution, distributed computing allows to use
different machine learning algorithms and soft computing methods. Iosifidis
et al. refers to kernel based learning algorithms for novel big data analytics
approaches in [18]. On the other hand, the distributed analysis approaches
are performed by several programming models such as MapReduce which is
implemented by Google to cope with massive databases. Its name implies data
processing steps as mapping at first step and reducing the computing require-
ments in the second step. The Hadoop based Apache Spark clustering is another
approach to decrease complex structures and to increase the processing speed.
The power and robust structure of Hadoop comes from two components that
one is Hadoop Distributed File System (HDFS) and the other is MapReduce
framework. In addition to these components, Directed Acyclic Graph (DAG)
scheduling structure improves capability of Hadoop architecture [7, 18].
The data mining provides solutions to many difficulties of Big Data analyt-
ics such as data searching, capturing, management, and result generation. The
conventional data management steps are listed as data cleaning, aggregation,
encoding, data storage, and data access which are applicable in Big Data stacks.
However, main challenge is how to manage the complexity of big data caused
by 4Vs, and how to process them in a distributed processing infrastructure. The
data scientists and researchers are focused on data acquisition, integration,
storage, and processing massive data stacks with limited hardware and software
requirements. The big data management refers obtaining clean data for reliable
outcomes, aggregating various resource data and encoding capability to provide
security and privacy on processing. Therefore, data management should be
performed accessible, manageable, and secure ways.