Distributed systems are hard. And when you're designing one, the CAP theorem will inevitably come up in the conversation. Yet it's one of the most frequently misunderstood concepts in the field. Let's fix that.
Proposed by Eric Brewer in 2000 and formally proven by Gilbert and Lynch in 2002, the CAP theorem states that a distributed data store can guarantee at most two of the following three properties simultaneously:
Here's what most articles get wrong: partition tolerance is not optional. Networks fail. Links go down. Packets are dropped. Any distributed system running across multiple nodes must tolerate partitions or it isn't truly distributed.
This means the real choice is between CP and AP — not a free-form pick of two.
When a network partition occurs:
┌─────────────┐ ┌─────────────┐
│ Node A │ ✗✗✗✗✗ │ Node B │
│ (Primary) │ dropped │ (Replica) │
└─────────────┘ └─────────────┘
CP: Reject writes on Node B → Consistent but unavailable
AP: Accept writes on Node B → Available but potentially inconsistent
In a CP system, when a partition occurs, nodes that can't confirm the latest state will refuse to serve requests rather than risk returning stale data.
Examples: HBase, Zookeeper, etcd, Consul
// Conceptual: CP behaviour under partition
async function readValue(key: string):
When to choose CP:
AP systems keep accepting reads and writes even during a partition, but different nodes may serve different data until the partition heals and they reconcile.
Examples: Cassandra, CouchDB, DynamoDB (in default mode)
// Conceptual: AP behaviour under partition
async function readValue(key: string): Promise<string> {
// Always responds — returns local state even if potentially stale
return localDb.get(key);
}
async function reconcile(remoteNode: Node): Promise<void> {
// After partition heals, merge state using Last-Write-Wins or CRDTs
const conflicts = await detectConflicts(remoteNode);
await resolveConflicts(conflicts);
}When to choose AP:
CAP only describes behaviour during partitions. In 2012, Daniel Abadi extended it with PACELC: even when the system is running normally (no partition), there's a tradeoff between Latency and Consistency.
| System | Partition behaviour | Normal behaviour |
|---|---|---|
| DynamoDB | AP | EL (low latency) |
| Spanner | CP | EC (consistent reads) |
| Cassandra | AP | EL |
| MySQL (single node) | CP | EC |
The CAP theorem is a fundamental constraint, not a design pattern. Partition tolerance is mandatory for distributed systems, making your real choice between consistency and availability during failures. Understanding where each system sits on this spectrum — and why — is what separates good distributed system design from guesswork.