Page 364 - From Smart Grid to Internet of Energy
P. 364

328  From smart grid to internet of energy
























            FIG. 8.8 Big Data analysis framework for smart grid applications [26].



            as data preprocessing for integration of the inherited data stacks. The unified
            data stacks are stored in comma-separated value (csv) files with several iden-
            tifying informations such as timestamp, device ID number, generated source
            and location data. Apache Flume is one of the most widely used distributed data
            collecting tool that collects, aggregates, and transfers huge amount of data to a
            Hadoop node. Once Flume server receives data stacks, it generates a few chan-
            nels regarding to data sizes and transmits data to HDFS which authorizes Flume
            for data write process. The hierarchical organization of seven step processing is
            shown in Fig. 8.8 which has been originally depicted in [26]. HDFS stores the
            received smart grid data where it generates clusters comprised by NameNodes.
            The metadata are managed by DataNode and prepares datasets for computa-
            tional processing to be handled by Hadoop Yarn that operates simultaneously
            with HDFS on same nodes. The MapReduce is also capable to operate with
            HDFS and other fundamental components at processing stage. The data query-
            ing stage is comprised by different tools that most widely used ones are Impala
            and Hive which are convenient for data selection from HDFS repository, ana-
            lyzing the data and generating required data selections [26].
               The acquired, processed and queried data stacks are prepared for data ana-
            lytics in the next step. The next stage is comprised by several analysis methods
            including Big Data analytics algorithms that are presented above for visual anal-
            ysis, data mining, prediction and forecasting purposes. The data sharing oper-
            ation at this level requires sophisticated security and privacy protections.
               The data analysis based on data mining is not new to power network sys-
            tems, but the methods used over passed years are evolved from SQL based anal-
            ysis to more sophisticated algorithms. It is obvious that smart grid applications
            require more efficient and effective methods and tools for dealing with rapidly
   359   360   361   362   363   364   365   366   367   368   369