Travel Package Purchase Prediction

Table of Contents

Context

"Visit with us". company wants to enable and establish a viable business model to expand the customer base. One of the ways to expand the customer base is to introduce a new offering of packages. Currently, there are 5 types of packages the company is offering - Basic, Standard, Deluxe, Super Deluxe, King. Looking at the data of the last year, we observed that 18% of the customers purchased the packages. However, the marketing cost was quite high because customers were contacted at random without looking at the available information.The company is now planning to launch a new product i.e. Wellness Tourism Package. Wellness Tourism is defined as Travel that allows the traveler to maintain, enhance or kick-start a healthy lifestyle, and support or increase one's sense of well-being.However, this time company wants to harness the available data of existing and potential customers to make the marketing expenditure more efficient.

We need to analyze the customers' data and information to provide recommendations to the Policy Maker and Marketing Team and also build a model to predict the potential customer who is going to purchase the newly introduced travel package.

Data Dictionary

Customer details:

Customer interaction data:

Problem

Libraries

Read and Understand Data

View the first and last 5 rows of the dataset.

Observations

Observations

Data Preprocessing

Processing Gender status.

Female and Fe male are two category in dataset , fixing it to Female

Age

Age can be a vital factor in tourism, converting ages to bin to explore if there is any pattern

Income

To understand customers segments derving new columns which will help us identify if customer in different income range

Exploratory Data Analysis

Univariate Analysis

Observations

Observations

Bivariate & Multivariate Analysis

Observations

Observation

Missing value Detection and Treatment

</h3>

Missing value Treatment Type of contact

Highest ocurring value is Self Inquiry. We will impute the missing value for TypeofContact using the mode(highest occuring value) of the feature.

Missing value Treatment number of followup.

Missing value Treatment PreferredPropertyStar

Let see how can impute PreferredPropertyStar using designation of customer for more granularity

Missing value Treatment Duration of pitch

Let see how can we impute Duration of pitch.In my opinion an important factor for how long sale person take times to market his sales pitch depends on Product which sale person is proposing , number of followup will also decide duration of pitch. Let verify this.

Missing value Treatment for NumberOfTrips

For more granularity imputing number of trips using martial status

Missing value Treatment NumberOfChildrenVisiting

Assuming children visited is missing because no children accompanied these customers so we will fill the missing values with 0

Missing value Treatment Age

Imputing age using designation,gender,Martial status would give more granularity

Missing value Treatment MonthlyIncome

For more granularity imputing on occupation,Designation,Gender

Finally all missing values have been treated.

Age

Age can be a vital factor in tourism, converting ages to bin to explore if there is any pattern

Income

To understand customers segments derving new columns which will help us identify if customer in different income range

Customer Profile by Product Type

Customer profile according to product pitched and product purchased

Basic package :Most of the customer have Monthly income < 25000, Age is in range of 26-30, Designation as Executive belong to City tier 1, are salaried and single males . Customer contacted the company.Married customers also prefer this basic package.

Deluxe package: Most of the customer have Monthly income < 25000, Age is in range of 31-40, Designation as Managers belong to city tier 3 and occupation is small business and married .Customer contacted the company. City tier 1 and divorced customers also preferred this package

King : Most of the customer have Monthly income in range of 30000-35000, age range in 51-60, Designation as VP. Belong to city tier 1 and are single female and Occupation is small business.Females buy this package more than men.

SuperDeluxe: Most of the customer have Monthly income < 35000, Age is in range 41-50, Designation as AVP, belongs to tier city 3 and is Single, male and occupation is salaried. Majority of them were company invited

Standard package: Most of the customer have Monthly income <30000,Age is in range of 31-40 , Designation as Senior Manager, is married , from tier city 3,and occupation is small business. majority of them had self inquired.

Insights based on EDA

Outlier Detection

Top

Split the dataset

Based on the information provided, i assume that Customer interaction data will not be available for new and potiental customers so dropping columns related to customer interaction

Model Building

Decision Tree

Observation

Decision tree is overfitting the training data as there is lot of disparity between test and train.Recall score is also not that high

Bagging classifier

Observation

Bagging is still overfitting the training data , Recall score has decreased for test data

Random Forest

Random forest is also overfitting the traning data

Model Performance Evaluation and Improvement-Bagging

Tuning Decision Tree

Most Important features are passport , Desgination as Executive,City tier 3.

Tuning Random Forest

Important features are Passport,Monthly Income,Age, designation executive.

Tuning Bagging Classifier

Top

Model Building Boosting

Adaboost

Gradient Boost

XGBoost

Model Performance Evaluation and Improvement-Boosting

Tuned AdaBoost Classifier

Tuned Gradient Boosting Classifier

Most Important features are Monthly income,Age,passport Desiginative executive, Number of trips,city tier 3.

Tuned XGBoost Classifier

Top

Stacking Classifier

Now, let's build a stacking model with the tuned models - decision tree, random forest,Adaboosting and gradient boosting, then use XGBoost to get the final prediction.

Comparing all models

Observations

Conclusion

Business Recommendations & Insights

We have been able to build a predictive model:

a) that the company can deploy to identify customers who will be interested in purchasing the Travel package.

b) that the company can use to find the key factors that will have an Garamond on a customer taking a product or not.