Page 361 - From Smart Grid to Internet of Energy
P. 361

Big data, privacy and security in smart grids Chapter  8 325



               TABLE 8.1 Mostly used machine learning algorithms in big data analytics

               Algorithm type                  Data processing method
               Naive bayes                     Classification
               K-nearest neighbor              Classification
               Support vector machine (SVM)    Classification
               Linear regression               Classification/regression
               Support vector regression       Classification/regression
               Classification and regression trees  Classification/regression
               Random forest                   Classification/regression
               Bagging                         Classification/regression
               Artificial neural network       Clustering/classification/regression
               Feed forward neural network     Clustering/classification/regression
               K-means                         Clustering
               Density based spatial clustering  Clustering




             are validated at the last step where classification, regression and evaluation of
             processed data are performed. The classification and prediction are the most
             important initial processing steps since they provide filtering, cleaning, valida-
             tion, and model selection on input databases. The model selection of machine
             learning algorithm enables to use learning datasets. The models include various
             duties as classification, regression, detection, sampling, noise filtering, and
             other solutions. The support vector machines (SVMs) and artificial neural net-
             works (ANNs) are most widely used models in machine learning systems. The
             conventional SVMs are binary classifiers which are used to find training sets
             with maximum benefit among others. The binary classifier feature of SVM
             is used to determine a hyperplane as a linear function of input data. Another
             important feature is related to training points requirement where SVM needs
             a few points that are called support vectors to classify next data points. SVMs
             are accepted as the best supervised learning models due to their efficiency deal-
             ing with high volume datasets by using limited memory resources. However,
             SVM causes to drawbacks since it is not capable to provide direct probability
             estimations [20, 22].
                The ANN is based on processing of larger datasets, improved initialization
             algorithms, robust learning models, and multilayered structure which is called
             deep learning. The complex structure of ANNs that are formed by hidden layers
             and intermediate layers is simplified by feedforward architectures that are
   356   357   358   359   360   361   362   363   364   365   366