๐ Technical Report
The report contains the full analysis and visualisations used to design the production system.
The Customer Churn & LTV Prediction Engine is a production-grade machine learning system that combines churn risk prediction and lifetime value forecasting to deliver actionable customer intelligence.
The system enables businesses to:
- Identify customers at risk of churning
- Predict the revenue impact of customer loss
- Segment customers into behavioural groups
- Simulate retention campaign ROI
- Serve predictions through a REST API
- Visualise insights through an interactive dashboard
The project demonstrates the complete ML engineering lifecycle โ from raw data and feature engineering through model training, explainability, segmentation, API deployment and containerisation.
| Model | Algorithm | Metric | Score |
|---|---|---|---|
| Churn | LightGBM | F1 Score | 0.8770 |
| Churn | LightGBM | ROC-AUC | 0.9836 |
| Churn | LightGBM | Recall | 1.0000 |
| LTV | RandomForest | Rยฒ Score | 0.9226 |
| LTV | RandomForest | RMSE | $2,806 |
| Segment | Customers | Avg LTV | Total LTV |
|---|---|---|---|
| ๐ข Champion | 2,472 | $21,231 | $52.5M |
| ๐ก At-Risk VIP | 828 | $11,106 | $9.2M |
| ๐ต Promising | 2,550 | $3,725 | $9.5M |
| ๐ Vulnerable | 850 | $2,093 | $1.8M |
| โช Hibernating | 2,474 | $547 | $1.4M |
| ๐ด Losing Customer | 826 | $75 | $0.1M |
| Segment | Investment | Net ROI | ROI % |
|---|---|---|---|
| At-Risk VIP | $41,400 | $4.10M | 9,896% |
| Vulnerable | $21,250 | $0.60M | 2,830% |
| Losing Customer | $4,130 | $0.01M | 126% |
| Total | $66,780 | $4.70M | 7,043% |
Raw Data
โ
Feature Engineering
โ
โโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโ
โ Churn Model โ LTV Model โ
โ LightGBM โ RandomForest โ
โ F1=0.8770 โ Rยฒ=0.9226 โ
โ AUC=0.9836 โ RMSE=$2,806 โ
โโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโ
โ
SHAP Explainability
โ
Customer Segmentation
(Composite Risk Score +
Within-Tier Percentile Ranking)
โ
โโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโ
โ FastAPI โ Streamlit โ
โ REST API โ Dashboard โ
โ Port 8000 โ Port 8501 โ
โโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโ
โ
Docker Compose Deployment
| Layer | Technology |
|---|---|
| Language | Python 3.12 |
| ML โ Churn | LightGBM |
| ML โ LTV | Scikit-learn RandomForest |
| Explainability | SHAP |
| API | FastAPI + Uvicorn |
| Dashboard | Streamlit + Plotly |
| Containerisation | Docker + Docker Compose |
| Data | Pandas, NumPy |
| Visualisation | Matplotlib, Seaborn, Plotly |
Customer_Churn_LTV_Engine/
โ
โโโ data/
โ โโโ ecommerce_user_segmentation.csv โ raw data
โ โโโ customer_segments.csv โ output
โ
โโโ models/
โ โโโ churn_model.pkl โ LightGBM
โ โโโ ltv_model.pkl โ RandomForest
โ
โโโ notebooks/
โ โโโ 01_model_comparison.ipynb โ exploration
โ โโโ 02_synthetic_data_experiment.ipynb
โ โโโ 03_shap_analysis.ipynb
โ โโโ 04_customer_segmentation.ipynb
โ
โโโ src/
โ โโโ feature_engineering.py โ features
โ โโโ train_churn_model.py โ churn model
โ โโโ train_ltv_model.py โ LTV model
โ โโโ model_comparision.py โ benchmarks
โ โโโ customer_segmentation.py โ segments
โ โโโ predict.py โ prediction
โ โโโ pipeline.py โ full pipeline
โ
โโโ api/
โ โโโ main.py โ FastAPI
โ
โโโ app/
โ โโโ streamlit_app.py โ dashboard
โ
โโโ reports/
โ โโโ shap_churn_summary.png
โ โโโ shap_churn_bar.png
โ โโโ shap_churn_dependence.png
โ โโโ shap_ltv_summary.png
โ โโโ shap_ltv_bar.png
โ โโโ shap_ltv_dependence.png
โ โโโ shap_cross_model.png
โ โโโ customer_quadrant.png
โ โโโ segment_analysis.png
โ โโโ revenue_recovery.png
โ
โโโ docker/
โ โโโ Dockerfile.api
โ โโโ Dockerfile.app
โ
โโโ .streamlit/
โ โโโ config.toml
โ
โโโ docker-compose.yml
โโโ requirements.txt
โโโ README.md
# Clone repository
git clone https://github.com/YOUR_USERNAME/Customer_Churn_LTV_Engine.git
cd Customer_Churn_LTV_Engine
# Create virtual environment
python -m venv venv
source venv/bin/activate # Mac/Linux
# Install dependencies
pip install -r requirements.txt
# Run full pipeline
cd src && python pipeline.py
# Start API
cd ../api && uvicorn main:app --reload
# Start dashboard (new terminal)
cd ../app && streamlit run streamlit_app.py# Clone repository
git clone https://github.com/YOUR_USERNAME/Customer_Churn_LTV_Engine.git
cd Customer_Churn_LTV_Engine
# Build and run
docker compose up --build
# Access:
# API โ http://localhost:8000
# API Docs โ http://localhost:8000/docs
# Dashboard โ http://localhost:8501cd src && python pipeline.pyRuns all 5 steps in sequence:
[1/5] Feature Engineering 0.5s โ
[2/5] Train Churn Model 5.1s โ
[3/5] Train LTV Model 7.6s โ
[4/5] Model Comparison 4.5s โ
[5/5] Customer Segmentation 1.4s โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Total Runtime 19.1s โ
# By customer ID
cd src && python predict.py --customer_id CUST00001
# Random customer
cd src && python predict.py --randomExample output:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
CUSTOMER PREDICTION REPORT
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Customer ID : CUST00001
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Churn Probability : 0.0000
Churn Risk : ๐ข LOW
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Predicted LTV : $3,340.30
LTV Tier : Mid LTV
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Composite Risk : 0.1900
Segment : ๐ต Promising
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
RECOMMENDED ACTIONS:
โ Upsell to higher order value products
โ Frequency incentives โ buy 3 get 1
โ Wishlist-based recommendations
โ Convert to Champion tier focus
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
The REST API is built with FastAPI and provides four endpoints:
| Method | Endpoint | Description |
|---|---|---|
| GET | /health |
API health check |
| GET | /predict/{customer_id} |
Predict by ID |
| POST | /predict |
Predict by features |
| GET | /segment/summary |
Segment summary |
curl http://localhost:8000/predict/CUST00001Response:
{
"customer_id": "CUST00001",
"churn_probability": 0.0,
"churn_risk": "๐ข LOW",
"predicted_ltv": 3340.30,
"ltv_tier": "Mid LTV",
"risk_score": 0.19,
"segment": "Promising",
"actions": [
"Upsell to higher order value products",
"Frequency incentives โ buy 3 get 1",
"Wishlist-based recommendations",
"Convert to Champion tier focus"
]
}curl -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d '{
"Customer_ID": "TEST_001",
"Recency": 5,
"Frequency": 80,
"Monetary": 25000.0,
"Avg_Order_Value": 312.5,
"Session_Count": 150,
"Avg_Session_Duration": 35.0,
"Pages_Viewed": 20,
"Clicks": 60,
"Campaign_Response": 1,
"Wishlist_Adds": 25,
"Cart_Abandon_Rate": 0.10,
"Returns": 1
}'Interactive API docs available at:
http://localhost:8000/docs
The Streamlit dashboard provides 6 interactive pages:
| Page | Description |
|---|---|
| ๐ Overview | KPI cards, revenue distribution, LTV violin plots |
| ๐ Customer Lookup | Live prediction, risk gauge, radar chart |
| ๐ Segment Analysis | Quadrant plot, heatmap, breakdowns |
| ๐ฐ Revenue Recovery | ROI simulation with live sliders |
| ๐ค Model Performance | ROC curve, SHAP charts, metrics |
| ๐ฏ Action Center | Campaign lists, export tools |
Access at: http://localhost:8501
| Notebook | Description |
|---|---|
| 01_model_comparison | Exploratory model benchmarking across 6 algorithms |
| 02_synthetic_data_experiment | Feature enrichment experiment + stress testing |
| 03_shap_analysis | Complete SHAP explainability analysis |
| 04_customer_segmentation | Segmentation development + revenue simulation |
| Rank | Feature | SHAP Value | Insight |
|---|---|---|---|
| 1 | Avg_Order_Value | 1.0000 | Hard threshold ~$50 |
| 2 | Cart_Abandon_Rate | 0.1016 | >25% โ churn signal |
| 3 | Wishlist_Adds | 0.0862 | <5 adds โ churn risk |
| Rank | Feature | SHAP Value | Insight |
|---|---|---|---|
| 1 | Frequency | 1.0000 | >40 purchases โ high LTV |
| 2 | Session_Count | 0.4275 | >75 sessions โ value signal |
| 3 | Wishlist_Adds | 0.2449 | Breadth of interest |
| Role | Features |
|---|---|
| Both Models | Avg_Order_Value, Wishlist_Adds |
| Churn Only | Cart_Abandon_Rate, Avg_Session_Duration |
| LTV Only | Frequency, Session_Count, Clicks |
Standard churn model threshold (0.30) was found to create a model blind spot โ high LTV customers received near-zero churn probability due to Avg_Order_Value dominance confirmed by SHAP analysis.
Solution: Composite Risk Score + Within-Tier Percentile Ranking
Risk Score = 0.40 ร Churn_Probability
+ 0.30 ร Recency_percentile
+ 0.30 ร Cart_Abandon_percentile
Segments assigned by top 25% riskiest
customers within each LTV tier
This approach mirrors production systems used at companies including Amazon, Netflix and Spotify.
# Build images
docker compose build
# Start all services
docker compose up -d
# View logs
docker compose logs -f
# Stop
docker compose downServices:
| Service | Container | Port |
|---|---|---|
| FastAPI | churn_ltv_api | 8000 |
| Streamlit | churn_ltv_app | 8501 |
-
Temporal Behaviour Modelling Replace static aggregates with time-series features to capture engagement trends.
-
Real-Time Churn Monitoring Streaming pipeline triggering alerts when behavioural signals cross SHAP thresholds.
-
Revenue Impact Simulation Retention ROI simulator integrated with historical campaign success rates.
-
Deep Learning Sequence Models LSTM or Transformer models trained on transaction-level time series.
Pratik Chetry
Built as a complete end-to-end ML engineering portfolio project demonstrating production-grade system design, model explainability, and business impact framing.
โญ If you found this project useful, please star it!