Credit Card Churn Prediction

What is Customer Churn? Customer churn means a customer’s ending their relationship with a bank/company for any reason. Although churn is inevitable at a certain level, a high customer churn rate is a reason for failing to reach the business goals. So identifying customers who would churn is very important for business

Table of Contents

Context

The Thera bank recently saw a steep decline in the number of users of their credit card, credit cards are a good source of income for banks because of different kinds of fees charged by the banks like annual fees, balance transfer fees, and cash advance fees, late payment fees, foreign transaction fees, and others. Some fees are charged to every user irrespective of usage, while others are charged under specified circumstances.

Customers’ leaving credit cards services would lead bank to loss, so the bank wants to analyze the data of customers and identify the customers who will leave their credit card services and reason for same – so that bank could improve upon those areas

Objective

Libraries

Read and Understand Data

Observations

Observations

Data Preprocessing

Observations

Age

Age can be a vital factor in tourism, converting ages to bin to explore if there is any pattern

Exploratory Data Analysis

Observations

Observations

Observations

Observation

Observation

Observations

Profile of customer who attrited most based on there card type

Insights based on EDA

Outlier Detection

508 customer have credit limit at 34516, it seems to be some default value.

896 customers has transcational amount greater than 8619.25000.With number of transcation count this data seems to be correct.

Not treating outliers here, and want alogorthims to learn about this outliers.

Missing value Detection and Treatment

There are Unknown values for the columns Education_Level,Marital_Status & Income_Category which can be treated as missing values. Replacing Unknown with nan

Missing-Value Treatment

Split the dataset

Encoding categorical variables

Model Building

Model evaluation criterion:

Model can make wrong predictions as:

  1. Predicting a customer will churn but he does not - Loss of resources
  2. Predicting a customer will not churn the services but he does - Loss of income

Which case is more important?

How to reduce this loss i.e need to reduce False Negatives?

Let's evaluate the model performance by using KFold and cross_val_score

K-Folds cross-validation provides dataset indices to split data into train/validation sets. Split dataset into k consecutive stratified folds (without shuffling by default). Each fold is then used once as validation while the k - 1 remaining folds form the training set.

Handling Imbalanced dataset

This is an Imbalanced dataset .A problem with imbalanced classification is that there are too few examples of the minority class for a model to effectively learn the decision boundary.

One way to solve this problem is to oversample the examples in the minority class. This can be achieved by simply duplicating examples from the minority class in the training dataset prior to fitting a model. This can balance the class distribution but does not provide any additional information to the model.One approach to addressing imbalanced datasets is to oversample the minority class. The simplest approach involves duplicating examples in the minority class, although these examples don’t add any new information to the model. Instead, new examples can be synthesized from the existing examples. This is a type of data augmentation for the minority class and is referred to as the Synthetic Minority Oversampling Technique, or SMOTE for short.

Over Sampling

Since dataset is imbalanced let try oversampling using SMOTE and see if performance can be improved.

The recall on test data is only 0.48 ,and model is overfitting there is lot of discrepancy between test score and train score. let try regularization

What is Regularization ?

Linear regression algorithm works by selecting coefficients for each independent variable that minimizes a loss function. However, if the coefficients are large, they can lead to over-fitting on the training dataset, and such a model will not generalize well on the unseen test data.This is where regularization helps. Regularization is the process which regularizes or shrinks the coefficients towards zero. In simple words, regularization discourages learning a more complex or flexible model, to prevent overfitting.

Main Regularization Techniques

Ridge Regression (L2 Regularization)

Ridge regression adds “squared magnitude” of coefficient as penalty term to the loss function.

Lasso Regression (L1 Regularizaion)

Lasso adds "absolute values of magnitude of coefficient as penalty term to the loss function

Elastic Net Regression

Elastic net regression combines the properties of ridge and lasso regression. It works by penalizing the model using both the 1l2-norm1 and the 1l1-norm1.

Elastic Net Formula: Ridge + Lasso

Regularization on Oversampled dataset

The recall on test data has improved let see if undersampling can improve the recall

Undersampling

Let see try undersampling and see if performance is different.

Logistic Regression on undersampled data

Observation

Model Performance Evaluation and Improvement-Logistic Regression

Logistic Regression with Under sampling is giving a generalized model and best recall with 0.857.

Model building Decision Tree ,Bagging and Boosting

Here I am building different models using KFold and cross_val_score with pipelines and will tune the best model 3 models using GridSearchCV and RandomizedSearchCV

Stratified K-Folds cross-validation provides dataset indices to split data into train/validation sets. Split dataset into k consecutive folds (without shuffling by default) keeping the distribution of both classes in each fold the same as the target variable. Each fold is then used once as validation while the k - 1 remaining folds form the training set.

Hyper Parameter Tuning

We will use pipelines with StandardScaler and classifiers model and tune the model using GridSearchCV and RandomizedSearchCV. We will also compare the performance and time taken by these two methods - grid search and randomized search.

Random Search . Define a search space as a bounded domain of hyperparameter values and randomly sample points in that domain.

Grid Search Define a search space as a grid of hyperparameter values and evaluate every position in the grid.

We can also use the make_pipeline function instead of Pipeline to create a pipeline.

make_pipeline: This is a shorthand for the Pipeline constructor; it does not require and does not permit, naming the estimators. Instead, their names will be set to the lowercase of their types automatically.

Comparing all models

Total Transcation count is most important features followed by Total Revolving balance and Total Transacational amount.

Conclusion

Business Recommendations & Insights