Page 247 -
P. 247

5.13 Modular Neural Networks   235


                                      In general, the idea behind modular neural networks is to profit from what each
                                    neural net can do best, so that they co-operate towards the goal of attaining a high
                                    classification performance.
                                      The hierarchical and ensemble approaches, although often achieving very good
                                    results, use the neural modules in  a decoupled way, i.e.,  there are no mechanisms
                                    for guiding the input feature vector to the most adequate module; also, the modules
                                    are not trained in a co-operative way, so that each module is tuned to its specialized
                                    recognition  task  taking  into  account  what  the  other  modules  are  doing.  A
                                    comparative  survey  of  modular  networks  with  a  description  of  co-operative
                                    mechanisms can be found in (Gasser A, Kame1 M, 1998).


                                    5.14  Neural Networks in Data Mining


                                    The purpose of data mining and the application of  statistical classification in data
                                    mining were presented in  section 4.7.  Neural networks play an important role in
                                    data  mining, namely the  feature  selection methods based  on  genetic algorithms,
                                    Kohonen's  self  organising  feature  maps  and  multi-layer  perceptrons.  These  are
                                    used  for classification or regression tasks, both called predictive  modelling in data
                                    mining jargon. The same requirements on algorithmic performance and evaluation
                                    of solutions, presented in section 4.7, are applicable to neural network approaches.
                                      Especially  of  interest  in  data  mining  applications  are  multi-layer  perceptrons
                                    solving  complex  regression/forecast  tasks.  In  order  to  give  a  taste  of  such  an
                                    application to a typical data mining problem, and to discuss some important issues,
                                    we  will  consider  the  problem  of  determining  a  useful  predictive  model  for  the
                                    revenue  of  invested  capital  using  the  Firms  dataset,  which  contains  a  table  of
                                    economic variables for 838 Portuguese firms (year 1995).
                                      In  order to build  a predictive model for the capital revenue (variable  CAPR),
                                    defined as the ratio of the net income (NI) over the invested capital (CAP), we may
                                    select as variables constituting the search space all those that bear no direct relation
                                    with CAPR, namely GI (gross income), CA (capital plus assets), NW (number of
                                    workers),  P  (apparent  productivity),  GIR  (gross  income  revenue),  A/C  (assets
                                    share) and DEPR (depreciations plus provisions), discarding the variables CAP and
                                    NI, which are obviously not of interest here.
                                      Performing feature  selection  with  the  genetic  algorithm tool  yielded  variable
                                    GIR  as  the  only  useful  variable. This  is  a  somewhat expected  result  given  that
                                    GIR=NI/GI and NI is directly related to CAPR. Running the Statistics  intelligent
                                    problem  solver  (IPS)  for  a  quick  search  for  an  MLP  solution,  a  reasonably
                                    performing MLP1:l:l  was  found,  using  variable  GIR  as input  and  achieving  a
                                    0.765 correlation.
                                      The search time is only about seven seconds on a 733 MHz Pentium. However,
                                    with 838 cases we are still far from the typical bulk of a data warehouse! Also, the
                                    quick search failed to find any alternative solutions to using GIR, and even failed
                                    to see any contribution of the BRANCH variable, which we may rightly suspect of
                                    having  a definite influence  on the  results.  As  a matter of  fact, by  performing  a
                                    quick  search  only  for  the  industrial  firms  (BRANCH=3,  500  cases),  a  better
   242   243   244   245   246   247   248   249   250   251   252