Preview — Pro guide
You are seeing a portion of this guide. Sign in and upgrade to unlock the full article, quizzes, and interview answers.
Sections
Related Guides
Two-Stage Retrieval & Ranking: The Architecture Behind Every Large-Scale Recommender
ML System Design
Online Learning and Continual Training: Beyond Scheduled Batch Retraining
ML System Design
Embeddings & Vector Databases: ANN Search at Scale
ML System Design
MLSD Case Study: End-to-End Recommender System
ML System Design
Multi-Task Learning: Shared-Bottom, MMOE, and Negative Transfer
ML System Design
Cold Start: Full Architecture for New Users and New Items
Cold start appears in every recommendation interview but is rarely answered correctly. This guide covers the complete production ML architecture: content embeddings for new items, population priors, bandit bootstrap, and how Pinterest, Airbnb, and TikTok solve cold start at scale.
Cold Start Is Not One Problem — It Is Three
"Handle cold start with a popularity fallback" is the L4 answer. It's not wrong — but it's the beginning of the answer, not the end. Production recommendation systems at Pinterest, Airbnb, TikTok, and YouTube have dedicated cold-start engineering teams, because cold start is a compound problem with three distinct sub-problems, each requiring a different solution.
Problem 1 — New user cold start: a user has just registered. The collaborative filtering model has no interaction history to embed them. What do you recommend?
Failure mode of naive solution: recommend the globally popular items → user sees the same generic recommendations as every other new user → no personalization → user retention in the first session is disproportionately important for long-term engagement, and generic recommendations miss the critical first-session engagement window.
Problem 2 — New item cold start: a new item (video, product, article, listing) was just added to the catalog. The two-tower retrieval model has no embedding for it — it wasn't in the training data. The item cannot be retrieved by ANN search. It gets zero impressions → zero clicks → zero training labels → never learns an embedding → permanently cold (the cold-start loop).
Failure mode of naive solution: inject new items via a rules-based popularity fallback (random insertion into feeds) → items are shown to users who are the least likely to engage → poor early CTR signals → item is de-prioritized by the ranker → items with good content but poor cold-start handling never get discovery.
Problem 3 — New context cold start: a user with a rich history on Platform A joins Platform B (e.g., a Spotify user opening Apple Music; a Twitter user joining BlueSky). The user's preferences are available from other contexts but not from the new platform's data.
This is often called the cross-platform or cross-surface cold-start problem, and requires knowledge transfer mechanisms — the hardest form.