Skip to main content
Learn · High-Level Design

High-Level Design

Distributed systems the way FAANG asks them: latency budgets, capacity math, sharding strategies, and failure modes — grounded in real system blueprints.

42
guides
High-Level Design40 min

Design a URL Shortener (TinyURL)

End-to-end design of a URL shortening service handling 10 billion stored URLs, 100K redirects/sec, and sub-5ms p99 redirect latency. Covers ID generation, read-heavy caching, multi-region replication, and async analytics.

CachingHashingBase62NoSQL+4
Intermediate
6 questions
High-Level Design35 min

HLD Interview Framework: CIRCLE Method

A battle-tested 6-step framework for HLD interviews with timing guidance, back-of-envelope estimation references, and the exact questions to ask at each step.

HLD FrameworkSystem Design InterviewRequirements AnalysisCapacity Estimation+7
Intermediate
5 questions
High-Level Design30 min

How to Approach an HLD System Design Interview

The pre-game mindset, signal management, and communication strategy for HLD interviews. Covers what interviewers evaluate, time budgeting, recovery patterns, and the failure modes that cause strong engineers to underperform on system design interviews.

HLD FrameworkSystem Design InterviewInterview StrategyRequirements Gathering+7
Intermediate
6 questions
High-Level Design40 min

How to Design at HLD: From Blank Whiteboard to Defensible Architecture

The mechanical playbook for HLD interview execution. Covers the API-first design rule, capacity-driven technology selection, the five-layer reference architecture, sharding decision tree, caching strategy, and the patterns FAANG candidates use to deliver consistently strong system designs.

HLD DesignSystem DesignAPI DesignSharding Strategy+8
Intermediate
6 questions
High-Level Design45 min

Caching: Strategy, Redis Internals & Distributed Patterns

Caching is the most tested topic in HLD interviews. Master cache write strategies, Redis data structures and their time complexities, distributed cache topology, the cache invalidation problem, and how to design multi-level caching architectures for real systems.

RedisCDNCache InvalidationEviction+4
Intermediate
6 questions
High-Level Design45 min

CDN: Edge Caching, Push vs Pull, and Invalidation at Global Scale

CDNs cut global latency from hundreds of milliseconds to tens, offload 80-99% of origin bandwidth, and shield origins from bursty traffic. Master push vs pull, cache hierarchy, invalidation, and the failure modes that break CDN-backed systems.

CDNCloudflareFastlyCloudFront+8
Intermediate
7 questions
High-Level Design55 min

Cloud Services Architecture for System Design Interviews

How to choose AWS, GCP, and Azure managed services in system design interviews with clear tradeoffs. Covers compute, storage, messaging, networking, identity, and serverless-vs-container decisions tied to latency, reliability, and cost.

AWSGCPAzureCloud Architecture+8
Intermediate
1 questions
High-Level Design45 min

Distributed Queue Design: Ordering, Retries, and Throughput

Design production-grade distributed message queue systems covering Kafka vs SQS vs RabbitMQ tradeoffs, delivery semantics, consumer group patterns, backpressure, and dead-letter queues. The system design interview's most-tested async primitive — master it to handle any event-driven architecture question.

Distributed QueueKafkaSQSRabbitMQ+10
Intermediate
6 questions
High-Level Design40 min

Load Balancer Design: L4/L7 Routing, Health Checks, and Failover

Design scalable load balancing for modern distributed systems. Covers L4 vs L7 tradeoffs, routing algorithms (round-robin, least-connections, P2C, consistent hashing), health check design, connection draining, sticky sessions, and global load balancing with GeoDNS and Anycast. Builds the mental model interviewers use to assess system design maturity.

Load BalancerL4L7Round Robin+11
Intermediate
6 questions
High-Level Design42 min

Rate Limiter Design: Token Bucket, Sliding Window, and Distributed Enforcement

Design distributed rate limiters for APIs and gateways. Covers all five algorithm tradeoffs (token bucket, leaky bucket, fixed window, sliding window log, sliding window counter), Redis data structure choices, hot-key mitigation, race conditions, and multi-region consistency. One of the most frequently asked HLD fundamentals at FAANG.

Rate LimiterToken BucketLeaky BucketSliding Window+10
Intermediate
6 questions
High-Level Design45 min

Design a Chat System (WhatsApp)

End-to-end design of a real-time encrypted messaging platform serving 2 billion users and 100 billion messages per day. Covers WebSocket connection management, the Signal Protocol for E2EE, store-and-forward offline delivery, and group message fan-out.

WebSocketsE2EEErlangCassandra+4
Advanced
6 questions
High-Level Design55 min

Design Google Docs (Real-Time Collaborative Editing)

System design deep-dive for real-time collaborative editing at 1B+ scale. Covers OT vs CRDTs, why Google Docs uses OT with a centralized server, WebSocket sync, offline editing with Yjs, and the tombstone problem that limits CRDT scalability.

Collaborative EditingOperational TransformationCRDTsGoogle Docs Architecture+8
Advanced
1 questions
High-Level Design60 min

Design a File Storage and Sync System (Dropbox / Google Drive)

End-to-end system design for a Dropbox-scale file storage and sync platform serving 500M users and 2.5EB of data. Covers content-addressable chunking, metadata-blob separation, the sync protocol, shared-folder consistency, version history, and the garbage-collection edge cases that interviewers target.

Dropbox DesignGoogle DriveContent Addressable StorageChunking+11
Advanced
1 questions
High-Level Design55 min

Design Google Maps (Routing, ETA & Map Tile Serving)

Deep-dive system design for Google Maps covering map tile serving at CDN scale, Contraction Hierarchies-based routing on a 36M-node road graph, and real-time ETA prediction from 100M GPS probe devices — with production numbers, algorithm tradeoffs, and what senior engineers get wrong.

Google Maps System DesignContraction HierarchiesMap Tile XYZ SchemeRouting Graph Algorithm+9
Advanced
1 questions
High-Level Design35 min

News Feed Architecture: Fan-Out on Write vs Read, Hybrid Strategies & Feed Ranking

The fan-out problem is the central challenge behind every social feed — Twitter, Instagram, Facebook, LinkedIn. Master fan-out on write (push), fan-out on read (pull), the hybrid approach for celebrity accounts, feed storage in Redis, Cassandra schema design, and how feed ranking layers on top of chronological ordering.

News FeedFan-OutFan-Out on WriteFan-Out on Read+11
Advanced
1 questions
High-Level Design55 min

Design a Multi-Channel Notification Service at Scale

End-to-end design of a notification platform delivering 1B notifications/day across push, email, and SMS. Covers the fan-out problem for broadcast sends, per-provider rate limiting, idempotency, retries, and compliance (GDPR/CAN-SPAM/TCPA) — the system design topics interviewers actually grill on.

KafkaRedisFCMAPNs+10
Advanced
1 questions
High-Level Design55 min

Design a Payment System (Stripe-style Processor)

System design deep-dive for a merchant payment processor handling 10K TPS peak, sub-500ms p99 latency, PCI DSS compliance, idempotent charges, double-entry ledger accounting, fraud scoring, and exactly-once distributed systems semantics under partial failure.

Payment System DesignStripe ArchitectureIdempotency-KeyDouble-Entry Ledger+8
Advanced
1 questions
High-Level Design45 min

Design Proximity Services (Yelp / Google Places)

System design deep-dive for a geo-search platform handling 200M+ monthly users, sub-100ms nearby business search across 50M+ geo-indexed points, geohash/S2/H3 indexing tradeoffs, read/write split at 100:1, sharding by geohash prefix, and ranking by distance, rating, and personalization signals.

Proximity Services DesignGeohash IndexGoogle S2 LibraryUber H3 Hexagonal Grid+8
Advanced
1 questions
High-Level Design45 min

Design Uber (Ride-Sharing Platform)

End-to-end design of a real-time ride-matching platform handling millions of simultaneous GPS location updates, sub-10-second driver matching, dynamic surge pricing, and global trip management across 70 countries.

GeospatialWebSocketsH3 Hexagonal GridKafka+4
Advanced
1 questions
High-Level Design50 min

Design Search Autocomplete / Typeahead (Google-scale)

System design deep-dive for a search typeahead service delivering sub-100ms top-K suggestions at 10B queries/day using sharded tries, precomputed top-K at each node, multi-tier caching, real-time trending pipelines, distributed systems scalability, and personalization.

Google Search TypeaheadAutocomplete DesignRadix TrieElasticsearch Prefix Query+8
Advanced
1 questions
High-Level Design55 min

Design a Stock Exchange (Order Book & Matching Engine)

System design for a stock exchange at NYSE/NASDAQ scale: 10B+ messages/day, sub-microsecond matching via LMAX Disruptor and FPGA, price-time priority order book, UDP multicast for market data, and why the matching engine is intentionally single-threaded for determinism and audit compliance.

Stock Exchange DesignOrder Book ArchitectureMatching EngineLMAX Disruptor+9
Advanced
1 questions
High-Level Design50 min

Design an Event Ticketing System (BookMyShow / Ticketmaster)

System design deep-dive for a flash-sale ticketing platform at Ticketmaster scale — 600K+ concurrent users competing for 50K seats in seconds — covering seat reservation with TTL, oversell prevention via optimistic locking, virtual waiting rooms, WebSocket availability overlays, and idempotent payment flows.

Event Ticketing System DesignFlash Sale ArchitectureOptimistic LockingRedis SETNX Seat Lock+8
Advanced
1 questions
High-Level Design40 min

Design Twitter/X Feed (News Feed)

End-to-end design of a social media news feed serving 200 million daily active users. The central design challenge is the fan-out problem: how to show a user their personalized timeline of tweets from all the accounts they follow, with < 200ms latency, at 180K reads/sec.

Fan-OutRedisCassandraKafka+4
Advanced
1 questions
High-Level Design50 min

Design Netflix (Video Streaming)

End-to-end design of a global video streaming platform serving 250M subscribers across 190 countries, handling 25 Tbps of peak traffic. Covers video ingestion and encoding pipelines, the Open Connect CDN, adaptive bitrate streaming, and recommendation at scale.

CDNVideo EncodingHLS/DASHMicroservices+4
Advanced
1 questions
High-Level Design50 min

Design a Web Crawler at Google Scale

End-to-end system design for a distributed web crawler at Google/Bing scale: 15B+ pages, ~6B refreshed daily, covering URL frontier design, OPIC-based priority scoring, politeness enforcement, SimHash near-duplicate detection, distributed sharding, and freshness-vs-depth tradeoffs most resources skip entirely.

Web Crawler DesignURL FrontierSimHash Near-Duplicate DetectionBloom Filter URL Deduplication+8
Advanced
1 questions
High-Level Design50 min

API Gateway Design: Auth, Rate Limiting, Routing, and BFF Patterns

Production API gateway architecture covering Kong, Envoy, AWS API Gateway, and BFF patterns. Where to terminate TLS, validate JWTs, enforce rate limits, aggregate requests — and how to avoid turning the gateway into a distributed monolith.

API GatewayKongEnvoyAWS API Gateway+10
Advanced
8 questions
High-Level Design55 min

Authentication & Authorization: JWT, OAuth 2.0, OIDC & Zanzibar ReBAC

Deep-dive on modern auth systems: AuthN vs AuthZ, session vs JWT tradeoffs, OAuth 2.0 flows (Authorization Code + PKCE, Client Credentials, Device), OIDC identity tokens, RBAC vs ABAC vs Google Zanzibar ReBAC, JWT revocation, key rotation, and WebAuthn passkeys for FAANG system design interviews.

JWTOAuth 2.0OIDCOpenID Connect+11
Advanced
7 questions
High-Level Design35 min

Consensus Protocols: Raft vs Paxos, Leader Election & Log Replication

Distributed consensus is what makes ZooKeeper, etcd, and CockroachDB correct. Master Raft's three-phase algorithm (leader election → log replication → commit), why Raft is simpler than Paxos, the split-brain scenario, Byzantine fault tolerance, and where consensus appears in real systems — Kafka, Kubernetes, and Google Spanner.

ConsensusRaftPaxosLeader Election+11
Advanced
1 questions
High-Level Design38 min

Consistency Models: Strong vs Eventual vs Causal, Linearizability, CRDTs & CAP Theorem

Consistency models define what values a distributed read can return after a write. Master linearizability (strong consistency), sequential consistency, causal consistency, eventual consistency, and read-your-writes — with practical implications for database selection, microservice design, and the CAP theorem's real-world limitations.

ConsistencyCAP TheoremLinearizabilityEventual Consistency+11
Advanced
1 questions
High-Level Design50 min

Containers and Kubernetes for System Design Interviews

A production-focused guide to Docker and Kubernetes for backend system design interviews. Covers pod scheduling, Deployment vs StatefulSet decisions, autoscaling, service mesh tradeoffs, and failure handling with concrete latency and reliability constraints.

KubernetesDockerPodsStatefulSet+8
Advanced
1 questions
High-Level Design35 min

Data Partitioning & Sharding: Consistent Hashing, Range Sharding & Hotspot Elimination

Sharding is what makes databases scale beyond a single machine. Master horizontal vs vertical partitioning, range sharding, hash sharding, consistent hashing with virtual nodes, hotspot detection, and resharding strategies — with real numbers from Cassandra, DynamoDB, and YouTube's architecture.

ShardingPartitioningConsistent HashingVirtual Nodes+10
Advanced
1 questions
High-Level Design45 min

Data Warehouse Architecture: Columnar Storage, MPP, and Lakehouse Tradeoffs

A practical system design guide to modern warehouse architecture using Snowflake, BigQuery, and Redshift patterns. Covers columnar storage internals, MPP execution, partitioning and clustering strategy, and warehouse-vs-lakehouse design choices.

Data WarehouseSnowflakeBigQueryRedshift+6
Advanced
1 questions
High-Level Design50 min

Databases: Sharding, Indexing & Replication

Database engineering for large-scale systems — the most tested HLD sub-topic after caching. Covers B-Tree and LSM-Tree storage engines, indexing strategies (covering, composite, partial), sharding strategies and hotspot handling, replication (sync vs async, leader election), and how to choose between SQL and NoSQL in system design interviews.

ShardingReplicationB-TreeLSM-Tree+6
Advanced
1 questions
High-Level Design35 min

Distributed Locks: Redlock, ZooKeeper, Fencing Tokens & Exactly-Once Guarantees

Distributed locks are the mechanism behind inventory reservations, payment deduplication, and leader election. Master Redis SET NX EX, the Redlock algorithm and its controversy, ZooKeeper ephemeral znodes, fencing tokens for safe expiry, and when distributed locks are the wrong tool entirely.

Distributed LockRedlockZooKeeperRedis SET NX+10
Advanced
1 questions
High-Level Design40 min

Distributed Systems Patterns

The 8 core distributed systems patterns every senior engineer must know: consistent hashing, CAP theorem, saga pattern, CQRS, event sourcing, two-phase commit, gossip protocol, and leader election.

CAP TheoremConsistent HashingCQRSSaga Pattern+7
Advanced
1 questions
High-Level Design42 min

Distributed Transactions: 2PC, Saga Pattern, and Compensating Transactions

Distributed transactions coordinate state changes across multiple services or databases. Two-Phase Commit (2PC) provides strong consistency but sacrifices availability. The Saga pattern achieves eventual consistency through compensating transactions — the approach used by Uber, Amazon, and Stripe for multi-step business workflows. This guide covers both models, their failure modes, and when each applies.

Distributed TransactionsTwo-Phase Commit2PCSaga Pattern+8
Advanced
1 questions
High-Level Design40 min

Event Sourcing and CQRS: Audit Logs, Temporal Queries, and Read/Write Separation

Event sourcing stores state as an immutable event log rather than mutable rows. CQRS separates write and read models for scalability in distributed systems. Together they power audit logs, temporal queries, and high-read systems at Stripe and Microsoft. Covers failure modes and the projection rebuild problem most candidates miss.

Event SourcingCQRSAudit LogDomain Events+8
Advanced
1 questions
High-Level Design55 min

Message Queues & Streaming: Kafka, Delivery Semantics, and Consumer Groups

Async messaging is the backbone of every scalable architecture. Master the queue-vs-log distinction, Kafka partitioning and consumer groups, delivery guarantees, ordering semantics, and the production failure modes that separate staff engineers from the crowd.

Message QueuesKafkaRabbitMQAWS SQS+10
Advanced
1 questions
High-Level Design40 min

Microservices Architecture: Decomposition, Service Mesh, Circuit Breakers & Saga Pattern

When and how to decompose a monolith into microservices — and the distributed systems complexity that follows. Covers domain-driven decomposition, the strangler fig migration pattern, service discovery, API gateway, circuit breakers (Hystrix/Resilience4j), service mesh (Istio), and the saga pattern for distributed transactions.

MicroservicesService MeshCircuit BreakerSaga Pattern+11
Advanced
1 questions
High-Level Design35 min

Observability: Metrics, Distributed Tracing, Structured Logging & SLO Design

Observability is what separates systems that are operated from systems that are debugged by guessing. Master the three pillars (metrics, logs, traces), SLI/SLO/SLA design, error budgets, structured logging, OpenTelemetry for distributed tracing, and the on-call runbook pattern. Includes the staff-level synthesis: designing observability before coding the system.

ObservabilityMetricsDistributed TracingStructured Logging+11
Advanced
1 questions
High-Level Design35 min

Search Internals: Inverted Index, TF-IDF, Elasticsearch Architecture & Relevance Ranking

Full-text search powers every application. Master the inverted index data structure, TF-IDF relevance scoring, BM25 (the modern standard), Elasticsearch's distributed shard architecture, query execution pipeline, and the tradeoffs between exact-match, fuzzy, and semantic search.

SearchInverted IndexElasticsearchTF-IDF+11
Advanced
1 questions
High-Level Design50 min

Stream Processing Systems: Flink, Kafka Streams, Windows, and Exactly-Once

A system design deep dive on real-time stream processing architecture. Learn how to choose Flink vs Kafka Streams, design windowed aggregations, handle late data, and implement exactly-once semantics with production-grade failure recovery.

Stream ProcessingApache FlinkKafka StreamsSpark Streaming+6
Advanced
1 questions