Machine Learning2024github.com/kmexa

Customer Churn Prediction: Production-Ready ML Pipeline

Production-grade churn prediction pipeline demonstrating zero-data-leakage architecture, reproducible preprocessing, and evaluation tied directly to business outcomes. Gradient Boosting tuned with GridSearchCV achieves ROC-AUC 0.934 and PR-AUC 0.812.

Download Notebook (.ipynb) View on GitHub

Methodology

01Feature engineering: AvgMonthlyCharges = TotalCharges / tenure captures customer value trajectory across the contract lifetime
02ColumnTransformer: StandardScaler for numerical features, OneHotEncoder for categorical - all inside sklearn Pipeline to guarantee zero data leakage across CV folds
03GridSearchCV over n_estimators x max_depth x learning_rate grid with stratified 5-fold cross-validation
04Threshold optimisation: default 0.5 threshold is rarely optimal; F1-maximising threshold found on validation set to balance precision-recall for the retention use case
05Feature importance mapped to concrete retention strategies with business annotation
06Business impact analysis: cost of false negatives (missed churners) versus false positives (unnecessary retention spend) quantified

Results

0.934
ROC-AUC
0.812
PR-AUC
7,043
Records
ModelROC-AUCPR-AUC
Logistic Regression0.8510.669
Random Forest0.9120.773
Gradient Boosting (tuned)0.9340.812
Gradient BoostingColumnTransformerPipelineGridSearchCVscikit-learnpandasPython

More Portfolio Projects

Fraud Detection NLP Spam Classifier Customer Churn News Classifier TreasuryIQ