Skip to main content

Preview — Pro guide

You are seeing a portion of this guide. Sign in and upgrade to unlock the full article, quizzes, and interview answers.

Databases: Sharding, Indexing & Replication

Database engineering for large-scale systems — the most tested HLD sub-topic after caching. Covers B-Tree and LSM-Tree storage engines, indexing strategies (covering, composite, partial), sharding strategies and hotspot handling, replication (sync vs async, leader election), and how to choose between SQL and NoSQL in system design interviews.

50 min read 3 sections 1 interview questions
ShardingReplicationB-TreeLSM-TreeIndexingSQLNoSQLCassandraPostgreSQLLeader Election

Storage Engines — How Databases Actually Store Data

Two dominant storage engine paradigms power most databases in production.

B-Tree (PostgreSQL, MySQL, SQLite) — data organized as a balanced tree of fixed-size pages (4–8 KB each). Pages hold sorted key-value pairs; writes update pages in-place.

  • Reads: O(log n) — traverse from root to leaf page
  • Writes: O(log n) — locate page, write to WAL first, then update in-place
  • Strengths: fast point and range queries, ideal for OLTP read/write balance
  • Weaknesses: random writes cause write amplification and disk seeks on HDDs

LSM-Tree (Cassandra, RocksDB, LevelDB, ClickHouse) — Log-Structured Merge tree. All writes land in an in-memory MemTable, flushed to immutable SSTables on disk when full. A background compaction process merges SSTables, purging duplicates and tombstones.

  • Reads: check MemTable + all SSTables; Bloom filters eliminate most unnecessary lookups
  • Writes: always sequential — no random I/O → extremely high write throughput
  • Strengths: write-optimized, perfect for time-series, logs, and append-heavy workloads
  • Weaknesses: read amplification — multiple SSTables must be checked per read

B-Tree vs LSM-Tree at a Glance

AspectB-Tree (PostgreSQL / MySQL)LSM-Tree (Cassandra / RocksDB)
Write patternRandom in-place page writesSequential writes (MemTable → SSTables)
Read patternO(log n) single lookupMemTable + multiple SSTables (mitigated by Bloom filters)
Write amplificationLow–Medium (WAL + page update)High (data written multiple times through compaction levels)
Read amplificationLow (1–2 disk reads typically)Moderate (several SSTables checked per read)
Space amplificationLow–Medium (dead rows via MVCC, vacuumed)Moderate (compaction lag leaves duplicates temporarily)
Best workloadOLTP: balanced read/write, complex queriesWrite-heavy: time-series, logs, IoT, large-scale append
ExamplesPostgreSQL, MySQL, SQLite, OracleCassandra, ScyllaDB, RocksDB, LevelDB, ClickHouse (parts)
IMPORTANT

Premium content locked

This guide is premium content. Upgrade to Pro to unlock the full guide, quizzes, and interview Q&A.