Page 145 - Building Big Data Applications
P. 145

Chapter 7   Banking industry applications and usage  143


                 will use historical data on former churners and will find similarity with existing cus-
                 tomers. The similarity will be tagged and identified in the current stack of customers
                 being processed and classifies those customers as potential churners. This classification
                 can be repeated every time for every customer and also groups of customers across
                 geographies. All the data will be managed and computed in the infrastructure.
                   Methods/techniques used to build the churn model and these include but are not
                 limited to the following:

                   Data mining classification techniques
                     Neural networks
                      - Performs very well with all data once trained and unleashed for execution
                      - Requires familiarity with the model and outcomes to understand the uncov-
                         ered patterns in the underlying data.
                      - Often being thought of as a black box
                      - Tend to be relatively slow during learning periods
                      - Training the neural network is an exercise that is time consuming but the re-
                         turn on investment is multi-fold
                     Logistic regression models
                      - Can give very strong insight into which variables are likely to predict the
                         event outcome
                      - To predict an outcome variable that is categorical from predictor variables
                         that are continuous and/or categorical
                      - Used because having a categorical outcome variable violates the assumption
                         of linearity in normal regression
                      - The only “real” limitation for logistic regression is that the outcome variable
                         must be discrete. Logistic regression deals with this problem by using a loga-
                         rithmic transformation on the outcome variable which allows us to model a
                         nonlinear association in a linear way.
                      - It expresses the linear regression equation in logarithmic terms (called the
                         logit)
                     Decision trees
                      - Easy to use
                      - Shows which fields are the most important
                      - Can be vulnerable to noise in the data
                      - Leaf in the decision tree could have similar class probabilities
                   The model can be designed and developed with either logistic regression or a decision
                 tree. The outcome from either use case will identify the most important variables.
                   The following are application predictions which can be plotted and displayed as
                 outcomes:
                   Younger people are more likely to churn and this is true
   140   141   142   143   144   145   146   147   148   149   150