Page 360 - From Smart Grid to Internet of Energy
P. 360
324 From smart grid to internet of energy
The training data has been evaluated by three algorithms as voting strategies as
seen in Fig. 8.6. The predicted tables and true labels are compared to determine
noise levels that are filtered out to generate clean data. This filtering and clean-
ing procedures are repeated until last fold reached.
The recent progress has brought several different analysis approaches such
as data mining, visualization, statistical methods, deep learning and machine
learning. The machine learning methods facilitate to discover knowledge and
intelligent decisions by using massive databases. It is analyzed in three catego-
ries according to its learning bases as supervised, unsupervised, and reinforce-
ment learning. The conventional data mining methods such as association
mining, clustering and classification lack in terms of efficiency, and they are
not able to provide scalable and accurate outcomes when they are applied to
Big Data stacks. The size, speed and variety of data streams prevent conven-
tional data mining methods to analyze data stacks permanently. Therefore,
researchers have improved new optimization methods and analytical
approaches for improving processing capability with limited resources.
8.3.2 Machine learning in big data analytics
The machine learning is a research area of computing science and an application
area of artificial intelligence that is based on processing inductive models
trained by limited data input. It is improved regarding to pattern recognition
and computational learning systems. The input data provide patterns for learn-
ing algorithm to define relationships among parameters of the database which is
called as training set and samples. The learning categories of a machine learning
system is comprised by three types of approaches as supervised, unsupervised,
and reinforcement. The supervised learning taxonomy is based on predicting
and output vector due to inherited knowledge from training set of input vectors
and corresponding relations. The supervised learning methodology is based on
classification and regression methods where classification denotes category
variables while regression defines prediction of numerical variables. On the
other hand, the unsupervised learning does not provide any training set and
there is not any labeling required for predicting the variables. These learning
structures are known as clustering algorithms or recommender systems. The
reinforcement learning addresses learning problem for particular action or a
set of actions to improve reliability of outcomes for a predefined situation.
The most widely used machine learning algorithms and data processing
methods are presented in Table 8.1 [20–22].
The machine learning process is performed at a few steps by following data
acquisition, preprocessing, selection, extraction, model selection, and valida-
tion stages. Different datasets and inputs are combined at data acquisition
and preprocessing steps while data cleaning is also performed at this stage.
The predefined particular features are selected and extracted in the next step
where it is followed by model selection step. All the selected and processed data