Skip to content

Pratikchetry/Customer-Churn-LTV-Prediction-Engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

3 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐ŸŽฏ Customer Churn & LTV Prediction Engine

End-to-End Machine Learning System for Customer Intelligence

Python LightGBM RandomForest FastAPI Streamlit Docker


๐Ÿ”— Technical Report

https://htmlpreview.github.io/?https://github.com/Pratikchetry/Customer_Churn_LTV_Engine/blob/main/reports/project_report.html

The report contains the full analysis and visualisations used to design the production system.


๐Ÿ“‹ Overview

The Customer Churn & LTV Prediction Engine is a production-grade machine learning system that combines churn risk prediction and lifetime value forecasting to deliver actionable customer intelligence.

The system enables businesses to:

  • Identify customers at risk of churning
  • Predict the revenue impact of customer loss
  • Segment customers into behavioural groups
  • Simulate retention campaign ROI
  • Serve predictions through a REST API
  • Visualise insights through an interactive dashboard

The project demonstrates the complete ML engineering lifecycle โ€” from raw data and feature engineering through model training, explainability, segmentation, API deployment and containerisation.


๐Ÿ† Key Results

Model Algorithm Metric Score
Churn LightGBM F1 Score 0.8770
Churn LightGBM ROC-AUC 0.9836
Churn LightGBM Recall 1.0000
LTV RandomForest Rยฒ Score 0.9226
LTV RandomForest RMSE $2,806

Segmentation Results

Segment Customers Avg LTV Total LTV
๐ŸŸข Champion 2,472 $21,231 $52.5M
๐ŸŸก At-Risk VIP 828 $11,106 $9.2M
๐Ÿ”ต Promising 2,550 $3,725 $9.5M
๐ŸŸ  Vulnerable 850 $2,093 $1.8M
โšช Hibernating 2,474 $547 $1.4M
๐Ÿ”ด Losing Customer 826 $75 $0.1M

Revenue Recovery Simulation

Segment Investment Net ROI ROI %
At-Risk VIP $41,400 $4.10M 9,896%
Vulnerable $21,250 $0.60M 2,830%
Losing Customer $4,130 $0.01M 126%
Total $66,780 $4.70M 7,043%

๐Ÿ—๏ธ Project Architecture

Raw Data
    โ†“
Feature Engineering
    โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Churn Model      โ”‚   LTV Model      โ”‚
โ”‚  LightGBM         โ”‚   RandomForest   โ”‚
โ”‚  F1=0.8770        โ”‚   Rยฒ=0.9226      โ”‚
โ”‚  AUC=0.9836       โ”‚   RMSE=$2,806    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
    โ†“
SHAP Explainability
    โ†“
Customer Segmentation
(Composite Risk Score +
 Within-Tier Percentile Ranking)
    โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  FastAPI          โ”‚  Streamlit       โ”‚
โ”‚  REST API         โ”‚  Dashboard       โ”‚
โ”‚  Port 8000        โ”‚  Port 8501       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
    โ†“
Docker Compose Deployment

๐Ÿ› ๏ธ Tech Stack

Layer Technology
Language Python 3.12
ML โ€” Churn LightGBM
ML โ€” LTV Scikit-learn RandomForest
Explainability SHAP
API FastAPI + Uvicorn
Dashboard Streamlit + Plotly
Containerisation Docker + Docker Compose
Data Pandas, NumPy
Visualisation Matplotlib, Seaborn, Plotly

๐Ÿ“ Project Structure

Customer_Churn_LTV_Engine/
โ”‚
โ”œโ”€โ”€ data/
โ”‚   โ”œโ”€โ”€ ecommerce_user_segmentation.csv  โ† raw data
โ”‚   โ””โ”€โ”€ customer_segments.csv            โ† output
โ”‚
โ”œโ”€โ”€ models/
โ”‚   โ”œโ”€โ”€ churn_model.pkl                  โ† LightGBM
โ”‚   โ””โ”€โ”€ ltv_model.pkl                    โ† RandomForest
โ”‚
โ”œโ”€โ”€ notebooks/
โ”‚   โ”œโ”€โ”€ 01_model_comparison.ipynb        โ† exploration
โ”‚   โ”œโ”€โ”€ 02_synthetic_data_experiment.ipynb
โ”‚   โ”œโ”€โ”€ 03_shap_analysis.ipynb
โ”‚   โ””โ”€โ”€ 04_customer_segmentation.ipynb
โ”‚
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ feature_engineering.py           โ† features
โ”‚   โ”œโ”€โ”€ train_churn_model.py             โ† churn model
โ”‚   โ”œโ”€โ”€ train_ltv_model.py               โ† LTV model
โ”‚   โ”œโ”€โ”€ model_comparision.py             โ† benchmarks
โ”‚   โ”œโ”€โ”€ customer_segmentation.py         โ† segments
โ”‚   โ”œโ”€โ”€ predict.py                       โ† prediction
โ”‚   โ””โ”€โ”€ pipeline.py                      โ† full pipeline
โ”‚
โ”œโ”€โ”€ api/
โ”‚   โ””โ”€โ”€ main.py                          โ† FastAPI
โ”‚
โ”œโ”€โ”€ app/
โ”‚   โ””โ”€โ”€ streamlit_app.py                 โ† dashboard
โ”‚
โ”œโ”€โ”€ reports/
โ”‚   โ”œโ”€โ”€ shap_churn_summary.png
โ”‚   โ”œโ”€โ”€ shap_churn_bar.png
โ”‚   โ”œโ”€โ”€ shap_churn_dependence.png
โ”‚   โ”œโ”€โ”€ shap_ltv_summary.png
โ”‚   โ”œโ”€โ”€ shap_ltv_bar.png
โ”‚   โ”œโ”€โ”€ shap_ltv_dependence.png
โ”‚   โ”œโ”€โ”€ shap_cross_model.png
โ”‚   โ”œโ”€โ”€ customer_quadrant.png
โ”‚   โ”œโ”€โ”€ segment_analysis.png
โ”‚   โ””โ”€โ”€ revenue_recovery.png
โ”‚
โ”œโ”€โ”€ docker/
โ”‚   โ”œโ”€โ”€ Dockerfile.api
โ”‚   โ””โ”€โ”€ Dockerfile.app
โ”‚
โ”œโ”€โ”€ .streamlit/
โ”‚   โ””โ”€โ”€ config.toml
โ”‚
โ”œโ”€โ”€ docker-compose.yml
โ”œโ”€โ”€ requirements.txt
โ””โ”€โ”€ README.md

โšก Quick Start

Option 1 โ€” Local Development

# Clone repository
git clone https://github.com/YOUR_USERNAME/Customer_Churn_LTV_Engine.git
cd Customer_Churn_LTV_Engine

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Mac/Linux

# Install dependencies
pip install -r requirements.txt

# Run full pipeline
cd src && python pipeline.py

# Start API
cd ../api && uvicorn main:app --reload

# Start dashboard (new terminal)
cd ../app && streamlit run streamlit_app.py

Option 2 โ€” Docker Compose

# Clone repository
git clone https://github.com/YOUR_USERNAME/Customer_Churn_LTV_Engine.git
cd Customer_Churn_LTV_Engine

# Build and run
docker compose up --build

# Access:
# API       โ†’ http://localhost:8000
# API Docs  โ†’ http://localhost:8000/docs
# Dashboard โ†’ http://localhost:8501

๐Ÿš€ Running Each Component

Full Pipeline

cd src && python pipeline.py

Runs all 5 steps in sequence:

[1/5] Feature Engineering    0.5s  โœ…
[2/5] Train Churn Model       5.1s  โœ…
[3/5] Train LTV Model         7.6s  โœ…
[4/5] Model Comparison        4.5s  โœ…
[5/5] Customer Segmentation   1.4s  โœ…
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
Total Runtime                19.1s  โœ…

Single Customer Prediction

# By customer ID
cd src && python predict.py --customer_id CUST00001

# Random customer
cd src && python predict.py --random

Example output:

โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
   CUSTOMER PREDICTION REPORT
โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
  Customer ID        : CUST00001
  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
  Churn Probability  : 0.0000
  Churn Risk         : ๐ŸŸข LOW
  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
  Predicted LTV      : $3,340.30
  LTV Tier           : Mid LTV
  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
  Composite Risk     : 0.1900
  Segment            : ๐Ÿ”ต Promising
  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
  RECOMMENDED ACTIONS:
    โ†’ Upsell to higher order value products
    โ†’ Frequency incentives โ€” buy 3 get 1
    โ†’ Wishlist-based recommendations
    โ†’ Convert to Champion tier focus
โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

๐ŸŒ API Documentation

The REST API is built with FastAPI and provides four endpoints:

Endpoints

Method Endpoint Description
GET /health API health check
GET /predict/{customer_id} Predict by ID
POST /predict Predict by features
GET /segment/summary Segment summary

Example โ€” Predict by Customer ID

curl http://localhost:8000/predict/CUST00001

Response:

{
  "customer_id": "CUST00001",
  "churn_probability": 0.0,
  "churn_risk": "๐ŸŸข LOW",
  "predicted_ltv": 3340.30,
  "ltv_tier": "Mid LTV",
  "risk_score": 0.19,
  "segment": "Promising",
  "actions": [
    "Upsell to higher order value products",
    "Frequency incentives โ€” buy 3 get 1",
    "Wishlist-based recommendations",
    "Convert to Champion tier focus"
  ]
}

Example โ€” Predict by Features

curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{
    "Customer_ID": "TEST_001",
    "Recency": 5,
    "Frequency": 80,
    "Monetary": 25000.0,
    "Avg_Order_Value": 312.5,
    "Session_Count": 150,
    "Avg_Session_Duration": 35.0,
    "Pages_Viewed": 20,
    "Clicks": 60,
    "Campaign_Response": 1,
    "Wishlist_Adds": 25,
    "Cart_Abandon_Rate": 0.10,
    "Returns": 1
  }'

Interactive API docs available at: http://localhost:8000/docs


๐Ÿ“Š Dashboard

The Streamlit dashboard provides 6 interactive pages:

Page Description
๐Ÿ  Overview KPI cards, revenue distribution, LTV violin plots
๐Ÿ” Customer Lookup Live prediction, risk gauge, radar chart
๐Ÿ“Š Segment Analysis Quadrant plot, heatmap, breakdowns
๐Ÿ’ฐ Revenue Recovery ROI simulation with live sliders
๐Ÿค– Model Performance ROC curve, SHAP charts, metrics
๐ŸŽฏ Action Center Campaign lists, export tools

Access at: http://localhost:8501


๐Ÿ““ Notebooks

Notebook Description
01_model_comparison Exploratory model benchmarking across 6 algorithms
02_synthetic_data_experiment Feature enrichment experiment + stress testing
03_shap_analysis Complete SHAP explainability analysis
04_customer_segmentation Segmentation development + revenue simulation

๐Ÿ” SHAP Analysis โ€” Key Findings

Churn Model (LightGBM)

Rank Feature SHAP Value Insight
1 Avg_Order_Value 1.0000 Hard threshold ~$50
2 Cart_Abandon_Rate 0.1016 >25% โ†’ churn signal
3 Wishlist_Adds 0.0862 <5 adds โ†’ churn risk

LTV Model (RandomForest)

Rank Feature SHAP Value Insight
1 Frequency 1.0000 >40 purchases โ†’ high LTV
2 Session_Count 0.4275 >75 sessions โ†’ value signal
3 Wishlist_Adds 0.2449 Breadth of interest

Cross-Model Insight

Role Features
Both Models Avg_Order_Value, Wishlist_Adds
Churn Only Cart_Abandon_Rate, Avg_Session_Duration
LTV Only Frequency, Session_Count, Clicks

๐ŸŽฏ Customer Segmentation

Methodology

Standard churn model threshold (0.30) was found to create a model blind spot โ€” high LTV customers received near-zero churn probability due to Avg_Order_Value dominance confirmed by SHAP analysis.

Solution: Composite Risk Score + Within-Tier Percentile Ranking

Risk Score = 0.40 ร— Churn_Probability
           + 0.30 ร— Recency_percentile
           + 0.30 ร— Cart_Abandon_percentile

Segments assigned by top 25% riskiest
customers within each LTV tier

This approach mirrors production systems used at companies including Amazon, Netflix and Spotify.


๐Ÿณ Docker Deployment

# Build images
docker compose build

# Start all services
docker compose up -d

# View logs
docker compose logs -f

# Stop
docker compose down

Services:

Service Container Port
FastAPI churn_ltv_api 8000
Streamlit churn_ltv_app 8501

๐Ÿ”ฎ Future Improvements

  1. Temporal Behaviour Modelling Replace static aggregates with time-series features to capture engagement trends.

  2. Real-Time Churn Monitoring Streaming pipeline triggering alerts when behavioural signals cross SHAP thresholds.

  3. Revenue Impact Simulation Retention ROI simulator integrated with historical campaign success rates.

  4. Deep Learning Sequence Models LSTM or Transformer models trained on transaction-level time series.


๐Ÿ‘ค Author

Pratik Chetry

Built as a complete end-to-end ML engineering portfolio project demonstrating production-grade system design, model explainability, and business impact framing.


โญ If you found this project useful, please star it!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors