Preview — Pro guide
You are seeing a portion of this guide. Sign in and upgrade to unlock the full article, quizzes, and interview answers.
Sections
Related Guides
ML System Design: Notification Ranking System
ML System Design
ML System Design: Demand Forecasting System
ML System Design
A/B Testing & Experimentation at Scale
Machine Learning
Causal Inference: DiD, Instrumental Variables, RDD, and When A/B Tests Fail
Machine Learning
Time Series Forecasting: ARIMA, Prophet, LightGBM, and Deep Learning
Machine Learning
MLSD Case Study: Churn Prediction with Survival and Uplift Modeling
Design a Spotify/Netflix-style churn prevention system using survival models, causal uplift targeting, and intervention policy optimization. Covers label definition, intervention economics, and production monitoring pitfalls.
Problem Framing: Predicting Churn Is Not Enough
In retention systems, predicting who will churn is only half the problem. The real objective is identifying who can be saved by intervention at positive ROI — and those two populations are not the same.
Consider the economics: some high-risk users are determined churners (no intervention will retain them), some are inevitable stayers (will renew regardless), and only the persuadable segment responds positively to an intervention. Spending retention budget on the first two groups wastes money. Worse, aggressive interventions on users who were going to stay anyway can feel intrusive and increase churn probability — a documented backfire effect in marketing research.
Strong interview answers separate three distinct model responsibilities:
- Risk model: estimates probability of churn or time-to-churn over a defined horizon (7, 30, or 90 days). Drives prioritization and urgency.
- Uplift model: estimates the incremental change in retention probability caused by a specific intervention (email, discount, content nudge). This is the causal question, not a predictive one.
- Policy layer: decides who to target, with which intervention, given budget constraints, channel saturation limits, and regulatory rules (e.g., discount opt-out requirements).
A system that targets purely on risk score achieves headline retention rates that look good offline but deliver poor incremental ROI in properly designed holdout experiments. Staff-level answers articulate the difference between association-based risk models and causal uplift models before being asked.