Page 364 - From Smart Grid to Internet of Energy
P. 364
328 From smart grid to internet of energy
FIG. 8.8 Big Data analysis framework for smart grid applications [26].
as data preprocessing for integration of the inherited data stacks. The unified
data stacks are stored in comma-separated value (csv) files with several iden-
tifying informations such as timestamp, device ID number, generated source
and location data. Apache Flume is one of the most widely used distributed data
collecting tool that collects, aggregates, and transfers huge amount of data to a
Hadoop node. Once Flume server receives data stacks, it generates a few chan-
nels regarding to data sizes and transmits data to HDFS which authorizes Flume
for data write process. The hierarchical organization of seven step processing is
shown in Fig. 8.8 which has been originally depicted in [26]. HDFS stores the
received smart grid data where it generates clusters comprised by NameNodes.
The metadata are managed by DataNode and prepares datasets for computa-
tional processing to be handled by Hadoop Yarn that operates simultaneously
with HDFS on same nodes. The MapReduce is also capable to operate with
HDFS and other fundamental components at processing stage. The data query-
ing stage is comprised by different tools that most widely used ones are Impala
and Hive which are convenient for data selection from HDFS repository, ana-
lyzing the data and generating required data selections [26].
The acquired, processed and queried data stacks are prepared for data ana-
lytics in the next step. The next stage is comprised by several analysis methods
including Big Data analytics algorithms that are presented above for visual anal-
ysis, data mining, prediction and forecasting purposes. The data sharing oper-
ation at this level requires sophisticated security and privacy protections.
The data analysis based on data mining is not new to power network sys-
tems, but the methods used over passed years are evolved from SQL based anal-
ysis to more sophisticated algorithms. It is obvious that smart grid applications
require more efficient and effective methods and tools for dealing with rapidly

