Preview — Pro guide
You are seeing a portion of this guide. Sign in and upgrade to unlock the full article, quizzes, and interview answers.
Sections
Related Guides
Databases: Sharding, Indexing & Replication
High-Level Design
Caching: Strategy, Redis Internals & Distributed Patterns
High-Level Design
Distributed Systems Patterns
High-Level Design
Consistency Models: Strong vs Eventual vs Causal, Linearizability, CRDTs & CAP Theorem
High-Level Design
Load Balancer Design: L4/L7 Routing, Health Checks, and Failover
High-Level Design
Data Partitioning & Sharding: Consistent Hashing, Range Sharding & Hotspot Elimination
Sharding is what makes databases scale beyond a single machine. Master horizontal vs vertical partitioning, range sharding, hash sharding, consistent hashing with virtual nodes, hotspot detection, and resharding strategies — with real numbers from Cassandra, DynamoDB, and YouTube's architecture.
Why Partitioning Exists
A single PostgreSQL instance handles ~10K writes/sec and stores ~10TB of data before performance degrades. At 1M writes/sec or 100TB of data, you need to split the data across multiple nodes. This is horizontal partitioning — sharding.
The core challenge: once you split data across N nodes, you face three new problems:
- Routing: given a key, which node holds the data?
- Rebalancing: when you add/remove nodes, how do you move the minimum data?
- Hotspots: when some keys are accessed 1000× more than others, one shard becomes a bottleneck while others sit idle.
Getting sharding right is one of the most consequential architectural decisions you'll make — wrong choices are expensive to undo at scale.