Preview — Pro guide
You are seeing a portion of this guide. Sign in and upgrade to unlock the full article, quizzes, and interview answers.
Sections
Related Guides
Distributed Systems Patterns
High-Level Design
Distributed Locks: Redlock, ZooKeeper, Fencing Tokens & Exactly-Once Guarantees
High-Level Design
Consistency Models: Strong vs Eventual vs Causal, Linearizability, CRDTs & CAP Theorem
High-Level Design
Databases: Sharding, Indexing & Replication
High-Level Design
Consensus Protocols: Raft vs Paxos, Leader Election & Log Replication
Distributed consensus is what makes ZooKeeper, etcd, and CockroachDB correct. Master Raft's three-phase algorithm (leader election → log replication → commit), why Raft is simpler than Paxos, the split-brain scenario, Byzantine fault tolerance, and where consensus appears in real systems — Kafka, Kubernetes, and Google Spanner.
The Consensus Problem
The consensus problem: N nodes in a distributed system must agree on a single value, even if some nodes crash or network messages are lost. Without consensus, two nodes might simultaneously believe they are the leader, resulting in conflicting writes — the split-brain scenario.
Consensus is the foundation of: leader election (ZooKeeper, etcd), replicated state machines (all strongly consistent databases), distributed transactions (two-phase commit), and fault-tolerant configuration stores (Kubernetes uses etcd for all cluster state).
The FLP impossibility theorem (Fischer-Lynch-Paterson, 1985) proves that in an asynchronous distributed system, no consensus algorithm can guarantee both safety (agreement is correct) and liveness (agreement always terminates) in the presence of even one failed node. Real consensus algorithms work around this by assuming partial synchrony — messages will eventually be delivered within a bounded time — which is true in practice for LAN and WAN environments.