Distributed consensus is what makes ZooKeeper, etcd, and CockroachDB correct. Master Raft's three-phase algorithm (leader election → log replication → commit), why Raft is simpler than Paxos, the split-brain scenario, Byzantine fault tolerance, and where consensus appears in real systems — Kafka, Kubernetes, and Google Spanner.

35 min read 2 sections 1 interview questions

ConsensusRaftPaxosLeader ElectionLog ReplicationQuorumSplit BrainZooKeeperetcdByzantine Fault ToleranceDistributed SystemsState Machine ReplicationTwo-Phase CommitLamport ClocksCockroachDB

The Consensus Problem

The consensus problem: N nodes in a distributed system must agree on a single value, even if some nodes crash or network messages are lost. Without consensus, two nodes might simultaneously believe they are the leader, resulting in conflicting writes — the split-brain scenario.

Consensus is the foundation of: leader election (ZooKeeper, etcd), replicated state machines (all strongly consistent databases), distributed transactions (two-phase commit), and fault-tolerant configuration stores (Kubernetes uses etcd for all cluster state).

The FLP impossibility theorem (Fischer-Lynch-Paterson, 1985) proves that in an asynchronous distributed system, no consensus algorithm can guarantee both safety (agreement is correct) and liveness (agreement always terminates) in the presence of even one failed node. Real consensus algorithms work around this by assuming partial synchrony — messages will eventually be delivered within a bounded time — which is true in practice for LAN and WAN environments.

IMPORTANT

Premium content locked

This guide is premium content. Upgrade to Pro to unlock the full guide, quizzes, and interview Q&A.

Upgrade to Pro Sign in to upgrade