Page 356 - From Smart Grid to Internet of Energy
P. 356

320  From smart grid to internet of energy


            assessment section is responsible for providing data quality service for users and
            applications, and supplies metadata on demand [16].
               There are two scaling approaches that are vertical and horizontal are used to
            determine various structure of big data. The vertical scaling defines processing
            devices such as memory, processing units, and other computing devices that
            are based on single operating system. Therefore, entire workload can be shared
            with several processing units and parallel computing devices. The horizontal
            scaling is based on increased number of computing nodes that can be extended
            as required. The main drawback of horizontal scaling is its use of multiple
            operating systems due to parallel computing infrastructure. The two featured
            platforms that are used in vertical and horizontal scaling environments are high
            performance computing clusters and Apache Hadoop [17].



            8.3.1 Data mining methods

            It is clear that distributed computing methods can cope with massive databases
            and can handle data analytics on overloaded data stacks. The distributed com-
            puting architecture provides high capabilities in terms of storage and processing
            speed. In addition to resource contribution, distributed computing allows to use
            different machine learning algorithms and soft computing methods. Iosifidis
            et al. refers to kernel based learning algorithms for novel big data analytics
            approaches in [18]. On the other hand, the distributed analysis approaches
            are performed by several programming models such as MapReduce which is
            implemented by Google to cope with massive databases. Its name implies data
            processing steps as mapping at first step and reducing the computing require-
            ments in the second step. The Hadoop based Apache Spark clustering is another
            approach to decrease complex structures and to increase the processing speed.
            The power and robust structure of Hadoop comes from two components that
            one is Hadoop Distributed File System (HDFS) and the other is MapReduce
            framework. In addition to these components, Directed Acyclic Graph (DAG)
            scheduling structure improves capability of Hadoop architecture [7, 18].
               The data mining provides solutions to many difficulties of Big Data analyt-
            ics such as data searching, capturing, management, and result generation. The
            conventional data management steps are listed as data cleaning, aggregation,
            encoding, data storage, and data access which are applicable in Big Data stacks.
            However, main challenge is how to manage the complexity of big data caused
            by 4Vs, and how to process them in a distributed processing infrastructure. The
            data scientists and researchers are focused on data acquisition, integration,
            storage, and processing massive data stacks with limited hardware and software
            requirements. The big data management refers obtaining clean data for reliable
            outcomes, aggregating various resource data and encoding capability to provide
            security and privacy on processing. Therefore, data management should be
            performed accessible, manageable, and secure ways.
   351   352   353   354   355   356   357   358   359   360   361