Page 145 - Building Big Data Applications

P. 145

Chapter 7 Banking industry applications and usage 143

will use historical data on former churners and will ﬁnd similarity with existing cus-
tomers. The similarity will be tagged and identiﬁed in the current stack of customers
being processed and classiﬁes those customers as potential churners. This classiﬁcation
can be repeated every time for every customer and also groups of customers across
geographies. All the data will be managed and computed in the infrastructure.
Methods/techniques used to build the churn model and these include but are not
limited to the following:

Data mining classiﬁcation techniques
Neural networks
- Performs very well with all data once trained and unleashed for execution
- Requires familiarity with the model and outcomes to understand the uncov-
ered patterns in the underlying data.
- Often being thought of as a black box
- Tend to be relatively slow during learning periods
- Training the neural network is an exercise that is time consuming but the re-
turn on investment is multi-fold
Logistic regression models
- Can give very strong insight into which variables are likely to predict the
event outcome
- To predict an outcome variable that is categorical from predictor variables
that are continuous and/or categorical
- Used because having a categorical outcome variable violates the assumption
of linearity in normal regression
- The only “real” limitation for logistic regression is that the outcome variable
must be discrete. Logistic regression deals with this problem by using a loga-
rithmic transformation on the outcome variable which allows us to model a
nonlinear association in a linear way.
- It expresses the linear regression equation in logarithmic terms (called the
logit)
Decision trees
- Easy to use
- Shows which ﬁelds are the most important
- Can be vulnerable to noise in the data
- Leaf in the decision tree could have similar class probabilities
The model can be designed and developed with either logistic regression or a decision
tree. The outcome from either use case will identify the most important variables.
The following are application predictions which can be plotted and displayed as
outcomes:
Younger people are more likely to churn and this is true

140 141 142 143 144 145 146 147 148 149 150