Page 357 - From Smart Grid to Internet of Energy
P. 357

Big data, privacy and security in smart grids Chapter  8 321


















             FIG. 8.5 A sample noise filtering for data cleaning application with MapReduce [19].

                The data cleaning which is also known as noise filtering can be performed by
             using some algorithms based on ensembles. An example homogenous ensemble
             filtering approach is illustrated in Fig. 8.5 that is proposed in [19]. Garcia-Gil
             et al. have implemented the filtering algorithm by using Apache Spark expres-
             sions that provide extending the MapReduce operation as shown below. Apache
             Spark structure is located on top of resilient distributed dataset which is denoted
             as RDD in table. The input parameters are defined as data, partitions, and
             number of trees for random data forest.

               Algorithm 8.1: HME-BD Algorithm
                  1: Input: data a RDD of tuples (label, features)
                  2: Input: P the number of partitions
                  3: Input: nTrees the number of trees for Random Forest
                  4: Output: the filtered RDD without noise
                  5: partitions   kFold(data, P)
                  6: filteredData   ∅
                  7: for all train, test in partitions do
                  8:    rfModel   randomForest(train, nTrees)
                  9:    rfPred   predict(r fModel, test)
                  10:    joinedData   join(zipWithIndex(test), zipWithIndex(r f Pred))
                  11:    markedData
                  12:    map original, prediction 2 joinedData
                  13:      if label(original) ¼ label(prediction) then
                  14:       original
                  15:      else
                  16:       (label ¼ ∅, features(original))
                  17:      end if
                  18:    end map
                  19:    filteredData   union( filteredData, markedData)
                  20: end for
                  21: return( filter(filteredData, label 6¼ ∅)
   352   353   354   355   356   357   358   359   360   361   362