Preview — Pro guide

You are seeing a portion of this guide. Sign in and upgrade to unlock the full article, quizzes, and interview answers.

Sections

0/3

Related Guides

HLD Interview Framework: CIRCLE Method

High-Level Design

35m

Caching: Strategy, Redis Internals & Distributed Patterns

High-Level Design

45m

Databases: Sharding, Indexing & Replication

High-Level Design

50m

Quiz

← Back to Library

High-Level Design·Advanced

Distributed Systems Patterns

The 8 core distributed systems patterns every senior engineer must know: consistent hashing, CAP theorem, saga pattern, CQRS, event sourcing, two-phase commit, gossip protocol, and leader election.

40 min read 3 sections 1 interview questions

CAP TheoremConsistent HashingCQRSSaga PatternConsensusDistributed SystemsTwo-Phase CommitEventual ConsistencyIdempotencyDistributed LocksOutbox Pattern

Why These Patterns Matter

At L5+ (senior), HLD interviews expect you to know not just what components to use, but why, and what trade-offs they introduce.

These 8 patterns appear in almost every large-scale system design interview at Google, Meta, and Amazon. Knowing them cold is the difference between a "hire" and a "strong hire."

What makes these patterns interview-critical is not that you can describe them — every candidate who prepared for a week can describe consistent hashing. What separates strong answers is knowing when to use each pattern, what failure mode it introduces, and which problem it explicitly does not solve.

Consistent hashing solves the redistribution problem but not the hotspot problem (a single key getting 100% of traffic). Saga solves distributed atomicity but not the intermediate-state visibility problem. Raft solves leader election but blocks during network partitions. CRDT solves conflict-free merges but cannot express all data types. Each pattern is a targeted solution to one specific problem — applying it to the wrong problem produces a system that's more complex without being more reliable.

The meta-pattern most candidates miss: distributed systems patterns are not interchangeable — they reflect a fundamental choice between consistency and availability (CAP), between coordination overhead and operational simplicity, between developer ergonomics and failure-mode transparency. The best interview answers treat each pattern as the solution to a specific forcing function, not a general-purpose improvement.

Pattern Selection — Matching the Problem to the Right Distributed Pattern

Is the operation a distributed write across multiple services or databases?

If yes: do you need strong atomicity (all commit or none)? → Two-Phase Commit if within the same DB cluster. Saga (orchestration or choreography) if across microservices. The key question: can you tolerate brief intermediate states being visible? If yes → Saga. If no → 2PC (and accept its blocking failure mode).

Are you distributing data across nodes and need to rebalance when adding capacity?

Consistent hashing: maps keys to nodes using a ring, so adding one node only remaps 1/N of keys (not a full reshuffle). Required for any distributed cache or database that needs to scale horizontally without full redistribution. Combine with virtual nodes (vnodes) to prevent hotspots when node capacities differ.

Do you need a single source of truth for leadership or configuration across distributed nodes?

Raft or Paxos: required for leader election (etcd, ZooKeeper, CockroachDB), distributed locks, and configuration consensus. Raft is simpler to implement and reason about. Use when strong consistency is required and you can tolerate ~300ms election latency on leader failure.

Do you have multi-master writes with no single leader (multi-region active-active)?

CRDTs: design your data types to support conflict-free merges. Works for counters (G-Counter, PN-Counter), sets (G-Set), and registers (LWW-Register). Does NOT work for arbitrary relational data. If you need multi-region active-active with non-CRDT data, you need vector clocks + application-level conflict resolution (Dynamo-style).

Do you have read-heavy workloads with complex queries but write-heavy raw events?

CQRS + Event Sourcing: separate the write model (append events to an immutable log) from the read model (materialized views optimized for query patterns). Enables independent scaling of reads and writes. Accept eventual consistency between write and read models (~100ms lag typical). Required when write throughput and read query complexity are both high.

IMPORTANT

Premium content locked

This guide is premium content. Upgrade to Pro to unlock the full guide, quizzes, and interview Q&A.

Upgrade to Pro Sign in to upgrade