Telecom Churn Prediction
End-to-end churn modeling pipeline using feature engineering, dimensionality reduction, and classifier benchmarking.
Why I Built This
Customer churn is one of the highest-impact problems in subscription businesses, where early intervention can save significant revenue. I built this project to model churn with enough lead time for targeted retention action, not just retrospective reporting. It was also a chance to work through realistic class imbalance and high-dimensional feature engineering at scale.
Dataset
- Around 100,000 customer records.
- 226 behavioral and usage features.
- Churn prediction in month 9 using signals from months 6, 7, and 8.
Modeling Workflow
- Data cleaning and high-value segment feature engineering.
- PCA for dimensionality reduction.
- Model comparison across Logistic Regression, Random Forest, and Gradient Boosting.
- Class imbalance handling via SMOTE and class weighting.
Results
- Best model: Gradient Boosting.
- Reported AUC-ROC: 0.85.
- Strong churn indicators included lower recharge behavior and declining call/data activity.
Links
- Code/notebook: TelecomChurnPredictor
- Colab entry point: Open in Colab