Sections

0/10

Related Guides

Caching: Strategy, Redis Internals & Distributed Patterns

High-Level Design

45m

Databases: Sharding, Indexing & Replication

High-Level Design

50m

Distributed Systems Patterns

High-Level Design

40m

Quiz

← Back to Library

High-Level Design·Intermediate

HLD Interview Framework: CIRCLE Method

A battle-tested 6-step framework for HLD interviews with timing guidance, back-of-envelope estimation references, and the exact questions to ask at each step.

35 min read 10 sections 5 interview questions

HLD FrameworkSystem Design InterviewRequirements AnalysisCapacity EstimationAPI DesignDatabase SelectionScalabilityLoad BalancingCaching StrategyCAP TheoremTrade-offs

What Interviewers Actually Evaluate

HLD interviews at FAANG evaluate 5 signals:

Structured thinking under ambiguity — do you ask requirements before designing?
Trade-off reasoning — can you explain "I chose X over Y because..."?
Back-of-envelope estimation — can you derive real numbers that drive design choices?

Understanding what interviewers are testing changes how you answer. The question "Design Twitter" is not a test of whether you know Twitter's architecture. It's a test of whether you can navigate ambiguity, make decisions under incomplete information, and articulate trade-offs coherently — all signals for how you'll perform in real engineering discussions.

What fails at the senior level: jumping straight to a component diagram without asking clarifying questions. Interviewers at L5+ are explicitly watching for this. A candidate who starts drawing boxes within the first 2 minutes of the question has already signaled a lack of structured process. The first 5 minutes should be requirements gathering and scale estimation — even if you "know" the answer already.

The asymmetry of mistakes: proposing the wrong technology can be corrected mid-interview if you explain your reasoning. Proposing a solution without explaining your reasoning signals that you can't think out loud under pressure — a much harder signal to recover from. Every design decision must be accompanied by the explicit trade-off: "I chose Cassandra here over MySQL because the write volume (100K/sec) and the query pattern (key-value lookup with no joins) favor wide-column storage — if the query patterns were more relational, I'd reconsider."

Scale estimates drive design, not the other way around: 100 QPS is a very different system from 100K QPS, even for the same problem. A system that needs to store 1MB/user for 1M users (1TB total) fits on a single server. The same system at 1B users (1PB) requires distributed object storage. Your estimation must happen before you draw any architecture — it determines every technology choice downstream.

Failure mode awareness — what happens when each component fails?
Communication — do you drive the conversation clearly and concisely?

CIRCLE — The 6-Step Framework

C — Clarify Requirements (5 min)

Never assume. Ask: What are the core features? Who are the users and where are they located? What scale are we designing for (users, requests/day)? Read-heavy or write-heavy? Consistency vs availability preference? Strong/eventual consistency? SLAs?

I — Identify Scale with Estimation (5 min)

Calculate: QPS (requests per second), storage needs (GB/TB/PB), bandwidth. Use these numbers to drive decisions: 'At 10,000 QPS a single DB won't cut it, so we need read replicas.' Numbers make abstract problems concrete.

R — Rough Design: API + Data Model + Flow (10 min)

Define: Key APIs (REST endpoints or event interfaces). Core data model (what entities, what fields, rough schema). High-level request flow — draw boxes and arrows. This is the skeleton; details come later.

C — Core Components Deep Dive (15 min)

Pick 2-3 most interesting/challenging components and deep-dive. Be guided by the interviewer. Common deep dives: Caching strategy, Database choice and sharding, Message queue design, CDN usage. Always explain WHY you chose each technology.

L — Latency & Bottleneck Analysis (10 min)

Where are the hot paths? What are the slowest operations? Apply solutions: CDN for static content, Redis for hot reads, connection pooling for DB, async queues for slow writes. Estimate the improvement each optimization makes.

E — Edge Cases & Failure Modes (5 min)

What happens when: cache goes down? DB primary fails? A service crashes? Network partition occurs? Discuss: retry with backoff, circuit breakers, fallback strategies, data consistency during failures.

Back-of-Envelope Estimation Cheat Sheet

Metric	Rule of Thumb
1M requests/day	≈ 12 QPS
100M requests/day	≈ 1,200 QPS
1B requests/day	≈ 12,000 QPS
Peak factor	3-5x average QPS
1 KB per record, 1B records	= 1 TB storage
Average tweet	280 bytes text + metadata ≈ 1 KB
Read from memory	0.1 ms
Read from SSD	1 ms
Network round trip (same region)	1-5 ms
Network round trip (cross-region)	30-150 ms
Single DB server	handles ~1,000-5,000 QPS
Cache hit rate target	> 90%

Database Selection Guide

Use Case	Best Choice	Why
User profiles, relationships	PostgreSQL / MySQL	Strong consistency, rich queries, ACID
Session cache, leaderboards	Redis	In-memory, sub-ms latency, sorted sets
Time-series (metrics, logs)	Cassandra / InfluxDB	Write-optimized, excellent time-range queries
Document store (flexible schema)	MongoDB	Schemaless, horizontal scale
Graph relationships	Neo4j / Amazon Neptune	Native graph traversal, relationship-heavy queries
Full-text search	Elasticsearch	Inverted index, relevance scoring
Real-time location/geo queries	Redis with GEO	GEORADIUS commands, in-memory speed
Blob storage (images, video)	S3 / GCS	Cheap, durable, CDN-ready

Generic Production Architecture — The 5 Layers

Rendering diagram...

TIP

The Most Common Mistake

Jumping to solutions before clarifying requirements. Interviewers deliberately leave requirements ambiguous to see if you ask. The first 5 minutes of clarification often change the entire design. For example: 'is this read-heavy?' might reveal that 90% of traffic is reads → you need aggressive caching, not write optimization.

CIRCLE Framework — HLD Interview Flow

Rendering diagram...

Full Back-of-Envelope: Design Instagram (Worked Example)

Scale assumptions

1B total users, 100M DAU. 50M photos uploaded/day. 500M photo views/day. Average photo = 3MB original, 200KB compressed. Average request = 400 bytes.

Write QPS

50M uploads/day ÷ 86400 seconds = 580 uploads/sec. Peak = 3× average = 1,740 uploads/sec. At 200KB each: 1,740 × 200KB = 348MB/sec upload bandwidth.

Read QPS

500M photo views/day ÷ 86400 = 5,800 reads/sec. Peak = 3× = 17,400 reads/sec. At 200KB each: 17,400 × 200KB = 3.48GB/sec. A CDN must serve this — a single server can't deliver 3.5GB/sec.

Storage

New photos: 50M/day × 200KB = 10TB/day. 3 years × 365 days × 10TB = 10.95PB. Add 3 sizes (original, medium, thumbnail) = 30PB in 3 years. S3 cost: $30PB × $0.023/GB/month = ~$700K/month storage alone.

Database

Metadata (photo_id, user_id, caption, timestamp): 50M rows/day × 365 × 3 = 55B rows at 500 bytes each = 27TB. PostgreSQL handles up to ~10TB with good tuning → need sharding after year 1. Shard by user_id. 10 shards initially with 2TB each.

CDN sizing

3.5GB/sec average, 10GB/sec peak. CloudFront: ~$0.085/GB in US → 3.5GB/sec × 86400 × 30 days = 9PB/month → $765K/month CDN cost. Cache hit rate target: 95% (popular photos served from CDN, not origin). With 95% hit rate, origin servers see 5% × 17,400 reads/sec = 870 reads/sec — a single server handles this easily.

Latency Numbers Every Engineer Should Know (2025)

Operation	Latency	Notes
L1 cache access	~0.5 ns	CPU registers + L1 cache
L2 cache access	~7 ns
L3 cache access	~40 ns
Main memory (RAM) read	~100 ns = 0.1 μs	100× slower than L1
SSD sequential read (4KB)	~150 μs = 0.15 ms
HDD read (seek + rotation)	~5-10 ms	50,000× slower than RAM
Network: same datacenter	~0.5 ms RTT
Network: US cross-country	~40 ms RTT
Network: US to Europe	~80 ms RTT
Network: US to Asia-Pacific	~150 ms RTT
Redis GET	~0.5–1 ms	Network + hash lookup
PostgreSQL query (indexed)	~1–5 ms	SSD read + query planning
PostgreSQL query (seq scan, large table)	~100ms–seconds	Avoid in hot paths
Object storage (S3 GET)	~50–200 ms	Varies by region
HTTP request to external API	~50–500 ms	Includes DNS + TLS

EXAMPLE

Interview Scenario: Design a Global Notification System

Problem: send push notifications to 500M mobile users. 10M notifications/day normally, spikes to 100M during major events (sports scores, breaking news).

Scale: 100M/day peak = 1,160/sec average, 10,000/sec peak. Each notification = 500 bytes = 50MB/sec peak.

Architecture: (1) API Gateway + notification service: accepts notification requests, validates, enriches with user preferences (do not disturb, language). Writes to Kafka topic 'notifications'. Does NOT send directly (decoupling).

(2) Kafka cluster: buffer for notification bursts. If push provider is slow (Apple APNS/Google FCM), the Kafka topic accumulates and is drained at a steady rate. Topic partition by user_id for ordering guarantees.

(3) Fan-out service: reads from Kafka, looks up user's device tokens from Redis/DynamoDB (< 5ms lookup). Routes to correct push provider. Handles retry with exponential backoff.

(4) Device token store: Redis Hash per user. user:{user_id}:devices → hash of {device_id: push_token}. DynamoDB for persistence. Redis for hot cache.

Scaling for 100M/event spike: Kafka handles the burst (consumer lag is acceptable). Fan-out service scales horizontally to 50 workers. Rate limit push provider calls (APNS: 1M/sec limit). Pre-warm connections to APNS/FCM.

Deduplication: notification_id in Redis SET with 24h TTL. If notification_id already in set, skip. Prevents duplicate sends during retry.

Key trade-offs to mention: delivery guarantees (at-most-once vs at-least-once), message expiry (don't send 'game just started' 6 hours later), respect user time zones.

Interview Questions

Click to reveal answers

Test your knowledge

Sign in to take the Quiz

This topic has 20 quiz questions with instant feedback and detailed explanations. Sign in to unlock quizzes.