Skip to main content

Preview — Pro guide

You are seeing a portion of this guide. Sign in and upgrade to unlock the full article, quizzes, and interview answers.

MLSD Case Study: End-to-End Recommender System

Design a production recommender stack from candidate generation to ranking, re-ranking, experimentation, and monitoring. Covers retrieval-ranking tradeoffs, feature freshness, exploration, and feedback-loop mitigation.

55 min read 2 sections 1 interview questions
ML System DesignRecommender SystemRetrieval RankingTwo-TowerFeature StoreExplore ExploitBanditsRanking MetricsA/B Testing

Problem Framing: Relevance, Revenue, and Retention

Recommenders optimize a multi-objective problem where the objectives are often in tension: short-term engagement (clicks, watch-time) can diverge from long-term retention (weekly active days, subscription renewal) and ecosystem health (creator diversity, content quality). A system that maximizes raw CTR tends to favor clickbait, homogenize content, and concentrate engagement on a small fraction of items and creators.

Strong interview answers translate business goals into an explicit objective function before touching architecture:

  • Primary metric: define the north-star metric (watch-time per session, gross merchandise value, daily active retention rate) with the team that owns business outcomes.
  • Secondary objectives: satisfaction signals (likes, saves, explicit ratings), diversity metrics, creator ecosystem health.
  • Guardrails: explicit constraints that must not be violated — filter bubble Gini cap, creator exposure fairness, content freshness, and policy compliance.

The framing should establish surface, user population, and scale before jumping to architecture. A home-feed recommender for 500M daily active users has different constraints than a product-search ranker for a B2B marketplace. Clarify: What is the serving surface? What is the p99 latency budget? Is freshness critical (news, live events) or can recommendations be cached?

The objective function also determines how you design training labels. If you optimize watch-time, long-watch events are positive and short-dwell events are soft negatives. If you optimize CTR, any click counts. These produce radically different models, and neither alone produces optimal long-term outcomes. Production systems use multi-task rankers with weighted combination of several signals, and the weighting is a product decision, not a modeling decision.

IMPORTANT

Premium content locked

This guide is premium content. Upgrade to Pro to unlock the full guide, quizzes, and interview Q&A.