Skip to main content

Preview — Pro guide

You are seeing a portion of this guide. Sign in and upgrade to unlock the full article, quizzes, and interview answers.

Cold Start: Full Architecture for New Users and New Items

Cold start appears in every recommendation interview but is rarely answered correctly. This guide covers the complete production ML architecture: content embeddings for new items, population priors, bandit bootstrap, and how Pinterest, Airbnb, and TikTok solve cold start at scale.

28 min read 2 sections 1 interview questions
Cold StartRecommendation SystemsSide-Channel EmbeddingsContent-Based FilteringThompson SamplingUser Cold StartItem Cold StartCross-Surface TransferMeta-LearningPopulation PriorCollaborative FilteringTwo-TowerProduction ML

Cold Start Is Not One Problem — It Is Three

"Handle cold start with a popularity fallback" is the L4 answer. It's not wrong — but it's the beginning of the answer, not the end. Production recommendation systems at Pinterest, Airbnb, TikTok, and YouTube have dedicated cold-start engineering teams, because cold start is a compound problem with three distinct sub-problems, each requiring a different solution.

Problem 1 — New user cold start: a user has just registered. The collaborative filtering model has no interaction history to embed them. What do you recommend?
Failure mode of naive solution: recommend the globally popular items → user sees the same generic recommendations as every other new user → no personalization → user retention in the first session is disproportionately important for long-term engagement, and generic recommendations miss the critical first-session engagement window.

Problem 2 — New item cold start: a new item (video, product, article, listing) was just added to the catalog. The two-tower retrieval model has no embedding for it — it wasn't in the training data. The item cannot be retrieved by ANN search. It gets zero impressions → zero clicks → zero training labels → never learns an embedding → permanently cold (the cold-start loop).
Failure mode of naive solution: inject new items via a rules-based popularity fallback (random insertion into feeds) → items are shown to users who are the least likely to engage → poor early CTR signals → item is de-prioritized by the ranker → items with good content but poor cold-start handling never get discovery.

Problem 3 — New context cold start: a user with a rich history on Platform A joins Platform B (e.g., a Spotify user opening Apple Music; a Twitter user joining BlueSky). The user's preferences are available from other contexts but not from the new platform's data.
This is often called the cross-platform or cross-surface cold-start problem, and requires knowledge transfer mechanisms — the hardest form.

IMPORTANT

Premium content locked

This guide is premium content. Upgrade to Pro to unlock the full guide, quizzes, and interview Q&A.