ML System Design: Social Feed Ranking System

Why Social Feed Ranking Is a Distinct MLSD Problem

Social feed ranking looks like video recommendation with a Twitter-shaped catalog. It isn't. The constraints are fundamentally different, and understanding those differences is the first signal an interviewer looks for.

Ephemeral content changes everything. A YouTube video from 2019 remains relevant. A tweet from 72 hours ago is stale. The model must heavily weight recency, which creates a constantly shifting candidate set — you can't pre-index everything because new posts arrive at millions per hour. ANN indices must be updated continuously, not just hourly.

The engagement space is richer and noisier. Instagram has ~10 action types. Twitter/X tracks likes, replies, reposts, quote-tweets, bookmarks, follows, link clicks, profile visits, mutes, blocks, and reports. Each signal carries a different semantic weight. A reply indicates much stronger intent than a like. A block is the strongest negative signal the system receives. Any model that treats these equivalently is wrong.

The social graph creates conflicting obligations. Users follow specific accounts. They have a social contract with that content — "I chose to follow this person, show me their posts." Pure algorithmic ranking can violate this contract (showing content from accounts the user doesn't follow instead), which creates user backlash even when engagement metrics improve. This is the friends vs. algorithm tension that every social platform struggles with.

Influence amplification at scale. Ranking 10M posts per second at p99 < 200ms while a viral post is propagating through the social graph — and the graph changes as users follow/unfollow in response to that viral post — is one of the hardest distributed ML serving challenges that exists.

This problem is asked at Twitter/X, LinkedIn, Meta (Threads, Facebook News Feed), Reddit, Pinterest, Snapchat, Bluesky, and every social platform company. The multi-task value model architecture applies directly to all of them.

TIP

What Interviewers Are Evaluating

Mid-level: Can you articulate the two-stage retrieval-ranking architecture? Do you understand in-network vs. out-of-network candidates? Can you explain why a chronological feed was replaced by ranked feed and what was lost?

Senior-level: Can you design a multi-task value model that combines action predictions into a single rank score? Do you understand recency decay and why it must be explicit in the scoring function? Can you identify echo chamber formation as a feedback loop pathology and propose mitigations? Do you name SimClusters or similar graph clustering for out-of-network discovery?

Staff-level: Do you reason about the social contract violation risk when the algorithm deprioritizes followed content? Can you design the counterfactual logging infrastructure needed to debias training data? Do you treat the value model weights as a product surface (PM-owned) vs. ML hyperparameters? Do you reason about virality dynamics and how a trending post changes the candidate distribution in real time?

Clarifying Questions — Ask These First

01

Which feed surface are we designing?

Home timeline / For You (algorithmic, mix of followed + discovered content) vs. Following tab (chronological, only followed accounts) vs. Search results vs. Topic/hashtag feeds. For this problem: the main algorithmic feed (For You equivalent), which is the hardest because it must balance social obligations with algorithmic optimization.

02

What scale?

For Twitter/X-scale: ~250M DAU, ~500M posts/day, ~1M posts/minute arriving. A user opens the app ~10 times/day. Feed must load in <200ms (p95). Candidate pool: ~1M recent posts per user (from followed accounts + out-of-network graph expansion), narrowed to a slate of 30-50 shown posts.

03

What's the ratio of in-network vs. out-of-network content?

Critical product decision that shapes the whole retrieval architecture. Twitter/X targets ~50% in-network (posts from accounts you follow) + ~50% out-of-network (discovered via algorithm). This ratio is a product lever, not an ML hyperparameter. Ask the interviewer if they have a target, or propose a starting ratio and explain the tradeoff.

04

What's the content lifetime / recency window?

Tweets: 24-48 hours max relevance. LinkedIn posts: 5-7 days. Reddit posts: hours to days depending on subreddit. This defines how aggressively we must decay content scores over time and how often we must refresh the candidate index.

05

Are there special content types?

Pure text posts, image posts, video posts, links, polls, threads — each has different engagement patterns and different optimal features. Ask if we need to handle all content types or focus on one. Note that mixing content types requires calibrating scores across types (a video's engagement rate is not directly comparable to a text tweet's).

06

Integrity / safety in scope?

Spam, misinformation, coordinated inauthentic behavior, bot amplification, NSFW content. Integrity is a re-ranking filter layer, not an afterthought. Confirm scope and note it explicitly.

IMPORTANT

Premium content locked

This guide is premium content. Upgrade to Pro to unlock the full guide, quizzes, and interview Q&A.

Upgrade to Pro Sign in to upgrade

Sections

Quiz

ML System Design: Social Feed Ranking System

What Interviewers Are Evaluating

Clarifying Questions — Ask These First

Premium content locked