Preview — Pro guide
You are seeing a portion of this guide. Sign in and upgrade to unlock the full article, quizzes, and interview answers.
Sections
Related Guides
ML System Design: Video Recommendation System
ML System Design
ML System Design: Social Feed Ranking System
ML System Design
Feature Stores: Online/Offline Architecture & Training-Serving Consistency
ML System Design
ML Model Evaluation & Production Monitoring: Shadow Mode, A/B Testing & Rollback
ML System Design
Probability Calibration: When Your Model's Probabilities Actually Mean Something
Machine Learning
MLSD Case Study: End-to-End Recommender System
Design a production recommender stack from candidate generation to ranking, re-ranking, experimentation, and monitoring. Covers retrieval-ranking tradeoffs, feature freshness, exploration, and feedback-loop mitigation.
Problem Framing: Relevance, Revenue, and Retention
Recommenders optimize a multi-objective problem where the objectives are often in tension: short-term engagement (clicks, watch-time) can diverge from long-term retention (weekly active days, subscription renewal) and ecosystem health (creator diversity, content quality). A system that maximizes raw CTR tends to favor clickbait, homogenize content, and concentrate engagement on a small fraction of items and creators.
Strong interview answers translate business goals into an explicit objective function before touching architecture:
- Primary metric: define the north-star metric (watch-time per session, gross merchandise value, daily active retention rate) with the team that owns business outcomes.
- Secondary objectives: satisfaction signals (likes, saves, explicit ratings), diversity metrics, creator ecosystem health.
- Guardrails: explicit constraints that must not be violated — filter bubble Gini cap, creator exposure fairness, content freshness, and policy compliance.
The framing should establish surface, user population, and scale before jumping to architecture. A home-feed recommender for 500M daily active users has different constraints than a product-search ranker for a B2B marketplace. Clarify: What is the serving surface? What is the p99 latency budget? Is freshness critical (news, live events) or can recommendations be cached?
The objective function also determines how you design training labels. If you optimize watch-time, long-watch events are positive and short-dwell events are soft negatives. If you optimize CTR, any click counts. These produce radically different models, and neither alone produces optimal long-term outcomes. Production systems use multi-task rankers with weighted combination of several signals, and the weighting is a product decision, not a modeling decision.