Telecom Churn Prediction

End-to-end churn modeling pipeline using feature engineering, dimensionality reduction, and classifier benchmarking.

Why I Built This

Customer churn is one of the highest-impact problems in subscription businesses, where early intervention can save significant revenue. I built this project to model churn with enough lead time for targeted retention action, not just retrospective reporting. It was also a chance to work through realistic class imbalance and high-dimensional feature engineering at scale.

Dataset

  • Around 100,000 customer records.
  • 226 behavioral and usage features.
  • Churn prediction in month 9 using signals from months 6, 7, and 8.

Modeling Workflow

  • Data cleaning and high-value segment feature engineering.
  • PCA for dimensionality reduction.
  • Model comparison across Logistic Regression, Random Forest, and Gradient Boosting.
  • Class imbalance handling via SMOTE and class weighting.

Results

  • Best model: Gradient Boosting.
  • Reported AUC-ROC: 0.85.
  • Strong churn indicators included lower recharge behavior and declining call/data activity.