Sharding is what makes databases scale beyond a single machine. Master horizontal vs vertical partitioning, range sharding, hash sharding, consistent hashing with virtual nodes, hotspot detection, and resharding strategies — with real numbers from Cassandra, DynamoDB, and YouTube's architecture.

35 min read 2 sections 1 interview questions

ShardingPartitioningConsistent HashingVirtual NodesHotspotRange ShardingHash ShardingRebalancingDynamoDBCassandraHBaseReshardingCross-Shard QueryScatter-Gather

Why Partitioning Exists

A single PostgreSQL instance handles ~10K writes/sec and stores ~10TB of data before performance degrades. At 1M writes/sec or 100TB of data, you need to split the data across multiple nodes. This is horizontal partitioning — sharding.

The core challenge: once you split data across N nodes, you face three new problems:

Routing: given a key, which node holds the data?
Rebalancing: when you add/remove nodes, how do you move the minimum data?
Hotspots: when some keys are accessed 1000× more than others, one shard becomes a bottleneck while others sit idle.

Getting sharding right is one of the most consequential architectural decisions you'll make — wrong choices are expensive to undo at scale.

IMPORTANT

Premium content locked

This guide is premium content. Upgrade to Pro to unlock the full guide, quizzes, and interview Q&A.

Upgrade to Pro Sign in to upgrade