ML System Design: Customer LTV, Survival Modeling, and Uplift for Treatment Targeting
Design a production customer lifetime value (CLV) system for subscription and marketplace businesses — from probabilistic churn (BG/NBD, Gamma-Gamma spend), survival curves with censoring, to uplift modeling for CRM campaigns. Covers why naive regression on historical LTV leaks future information, how Shopify-style merchants use CLV for acquisition bids, and evaluation with calibrated dollar errors plus policy simulation.
Why CLV Is a Systems Problem, Not a Regression Exercise
CLV is the discounted expected future cash contribution of a customer. Interviewers fail candidates who regress historical realized revenue as the label — that leaks the future into features unless you snapshot at cohort time.
Production CLV powers bid caps in paid acquisition, CRM send limits, and credit lines — each needs calibrated dollars, uncertainty, and freshness under right-censoring (young customers have unknown horizons).
Depth Ladder
Mid-level: Churn classifier × ARPU heuristic.
Senior: Buy Till You Die family (BG/NBD repeat purchases, Gamma-Gamma monetary), Cox baseline for subscription tenures, calibration curves.
Staff: Policy evaluation — uplift for treatment; multi-touch acquisition credit interaction; fair lending constraints when CLV affects pricing.
Clarifying Questions
Business model
Subscription vs transactional vs hybrid — model choice differs.
Definition of churn
Contract cancel vs 90d inactivity — changes survival labels.
Horizon
12-month vs 5-year CLV — discount rate and uncertainty explode.
Action use-case
Bidding, email frequency, credit — each needs different calibration.
Cold start
New users with one session — hierarchical partial pooling.
Probabilistic Models — BG/NBD and Gamma–Gamma Intuition
BG/NBD models repeat purchase counts while alive using latent dropout process — handles non-contractual churn where "churn" is unobserved directly.
Gamma–Gamma models spend per transaction heterogeneity — combine with BG/NBD for expected monetary value per period.
These beat raw RFM regression on sparse histories because they encode generative assumptions — interviewers at retail DS teams still expect these names.
CLV Platform — Offline Fit to Online Scoring
Model Families — When to Use Which
| Family | Data needs | Strength | Risk |
|---|---|---|---|
| Heuristic ARPU × E[life] | Low | Explainable | Biased with censoring |
| BG/NBD + Gamma | Repeat purchases | Probabilistic, sparse friendly | Assumption mismatch |
| Cox / survival NN | Contract churn | Handles censoring | PH assumption or black box |
| Global DNN on sequences | Rich event logs | Flexible | Leakage if labels wrong |
| Production path | Probabilistic baseline DNN residual on top | Best of both | Two-stage complexity |
Label Leakage
Using lifetime revenue to date as label for all users includes post-snapshot purchases in features if you are not careful — point-in-time joins are mandatory.
Interview Closer
"I'd start from churn plus spend generative models for interpretability, add hierarchical pooling for cold start, calibrate dollars on a holdout cohort, and only then layer uplift for CRM — CLV without calibration is worse than no score for bidding."