Page 349 - From Smart Grid to Internet of Energy
P. 349
Big data, privacy and security in smart grids Chapter 8 313
Although the challenges and technical deficiencies, decision-making
methods based on data inheritance are widely accepted by authorities. The data
is a strategic component generated by technical and natural resources. The
collected data are not usually ready for processing and defined as raw-data that
are required to be located, identified, understood, and prepared for effective
processing. At first step, data integration and cleaning are required to convert
inherited raw data for storage. The big data differs from conventional data
management systems due to their heterogenous formats such as structured,
unstructured and semi-structured data sets as seen in Fig. 8.2. It is noted by
reports that nearly 85% of inherited data are semi-structured or unstructured that
are treated by nonrelational analytic technologies such as MapReduce or
Hadoop. The three Vs which are volume, velocity, and variety are very impor-
tant among others for data analytics in big data. The data volume is enormously
growing year by year and it is expected to reach up to 40 Zeta bytes (ZB) until
2020. Therefore, the velocity of data acquisition and processing should be as
fast as volume of growing data size. On the other hand, the variety of data is
another interest of big data researches since the data types and databases are
differing in terms of structured or unstructured, public or private, shared or
confidential types [6, 7].
In addition to challenges in data acquisition, big data applications bring
several problems on generating correct metadata which is related with proces-
sing the acquired and stored data. The data analysis challenges are tackled by
using sophisticated data mining techniques that provide to discover integrated,
meaningful, clear and accessible data stacks. The gradually increased data sizes
and volumes force researchers to improve computational methods for efficient
data processing processes. The big data analytics require some efforts such as
integration of massive data types with data correlation procedures, reliable and
rapid processing models, real time processing and sampling capabilities of
processors, and interactive user interfaces for managing the data processing
ecosystem. The data processing operations are based on utilization of linear
equation solvers, optimization algorithms, linear and nonlinear prediction pro-
cedures such as Wiener and Kalman filters, canonical correlation analysis,
linear discriminant analysis, and adaptive sampling processes such as belief
propagation, sensing, and k-nearest neighbor algorithms [6]. The stages of
big data processes are presented in the following sections according to
data generation, data acquisition and storage, machine learning methods, and
Internet of Things (IoT) applications in big data ecosystem.
8.2.1 Big data generation
The data generation is preliminary step of big data operations. The critical appli-
cations, measurement and control devices, ICT interfaces used in smart grid and
smart sensors generate the highest share of big data in smart grid applications.
The big data of smart grid is a combination of all the inherited data from smart