Preview — Pro guide
You are seeing a portion of this guide. Sign in and upgrade to unlock the full article, quizzes, and interview answers.
Sections
ML System Design: Social Feed Ranking System
Design a production-grade social feed ranking system from scratch — the architecture powering Twitter/X, LinkedIn, Reddit, and Threads. Covers multi-source candidate retrieval (in-network + out-of-network), multi-task value model predicting 10+ user actions, recency engineering, echo chamber feedback loops, counterfactual logging, and the exact latency budget for a <200ms feed load. Includes the open-sourced X (Twitter) algorithm analysis, SimCluster-based out-of-network discovery, and what each level (mid/senior/staff) must cover.
What Interviewers Are Evaluating
Mid-level: Can you articulate the two-stage retrieval-ranking architecture? Do you understand in-network vs. out-of-network candidates? Can you explain why a chronological feed was replaced by ranked feed and what was lost?
Senior-level: Can you design a multi-task value model that combines action predictions into a single rank score? Do you understand recency decay and why it must be explicit in the scoring function? Can you identify echo chamber formation as a feedback loop pathology and propose mitigations? Do you name SimClusters or similar graph clustering for out-of-network discovery?
Staff-level: Do you reason about the social contract violation risk when the algorithm deprioritizes followed content? Can you design the counterfactual logging infrastructure needed to debias training data? Do you treat the value model weights as a product surface (PM-owned) vs. ML hyperparameters? Do you reason about virality dynamics and how a trending post changes the candidate distribution in real time?
Clarifying Questions — Ask These First
Which feed surface are we designing?
Home timeline / For You (algorithmic, mix of followed + discovered content) vs. Following tab (chronological, only followed accounts) vs. Search results vs. Topic/hashtag feeds. For this problem: the main algorithmic feed (For You equivalent), which is the hardest because it must balance social obligations with algorithmic optimization.
What scale?
For Twitter/X-scale: ~250M DAU, ~500M posts/day, ~1M posts/minute arriving. A user opens the app ~10 times/day. Feed must load in <200ms (p95). Candidate pool: ~1M recent posts per user (from followed accounts + out-of-network graph expansion), narrowed to a slate of 30-50 shown posts.
What's the ratio of in-network vs. out-of-network content?
Critical product decision that shapes the whole retrieval architecture. Twitter/X targets ~50% in-network (posts from accounts you follow) + ~50% out-of-network (discovered via algorithm). This ratio is a product lever, not an ML hyperparameter. Ask the interviewer if they have a target, or propose a starting ratio and explain the tradeoff.
What's the content lifetime / recency window?
Tweets: 24-48 hours max relevance. LinkedIn posts: 5-7 days. Reddit posts: hours to days depending on subreddit. This defines how aggressively we must decay content scores over time and how often we must refresh the candidate index.
Are there special content types?
Pure text posts, image posts, video posts, links, polls, threads — each has different engagement patterns and different optimal features. Ask if we need to handle all content types or focus on one. Note that mixing content types requires calibrating scores across types (a video's engagement rate is not directly comparable to a text tweet's).
Integrity / safety in scope?
Spam, misinformation, coordinated inauthentic behavior, bot amplification, NSFW content. Integrity is a re-ranking filter layer, not an afterthought. Confirm scope and note it explicitly.