Split-Brain Protection¶
pgraft provides 100% split-brain protection through the Raft consensus algorithm. This page explains how it works and why you can trust it.
What is Split-Brain?¶
Split-brain is a dangerous condition in distributed systems where:
- Network partition divides the cluster
- Multiple nodes believe they are the leader
- Different leaders accept conflicting writes
- Data corruption and inconsistency result
Without Protection
In traditional PostgreSQL replication, network partitions can lead to split-brain scenarios where multiple nodes accept writes, causing permanent data inconsistencies.
How pgraft Prevents Split-Brain¶
pgraft uses the Raft consensus algorithm which provides mathematical guarantees against split-brain through four key mechanisms:
1. Quorum Requirement¶
Leader election requires a majority of votes (N/2 + 1):
- 3-node cluster: Needs 2 votes
- 5-node cluster: Needs 3 votes
- 7-node cluster: Needs 4 votes
Why this prevents split-brain:
Network Partition Example (3-node cluster):
Partition 1: Node 1, Node 2
Partition 2: Node 3
Partition 1 (2 nodes):
- Has majority (2 out of 3)
- Can elect leader
- Can accept writes
Partition 2 (1 node):
- No majority (1 out of 3)
- Cannot elect leader
- Cannot accept writes (read-only)
Mathematical guarantee: Only one partition can have a majority, therefore only one leader can be elected.
2. Term Monotonicity¶
Each leader election increments the term number:
- Terms are strictly increasing
- Higher term always wins
- Old leaders automatically step down when they see a higher term
Example:
Time T0:
Node 1 is leader in term 5
Network partition occurs
Time T1:
Partition A: Node 1 (term 5)
Partition B: Node 2, 3 elect new leader (term 6)
Network heals
Time T2:
Node 1 sees messages from term 6
Node 1 automatically steps down
Only one leader remains (term 6)
3. Log Completeness¶
Only nodes with up-to-date logs can be elected:
- Candidates with incomplete logs lose elections
- New leader always has all committed entries
- Prevents data loss during leadership transitions
Election Rules:
Candidate A: Last log index = 100, Last log term = 5
Candidate B: Last log index = 95, Last log term = 5
Result: A wins (more up-to-date log)
Candidate C: Last log index = 100, Last log term = 4
Candidate D: Last log index = 95, Last log term = 5
Result: D wins (higher term in last entry)
4. Single Leader Per Term¶
Raft's fundamental guarantee:
At most one leader can be elected in a given term.
Why?
- Leader needs majority of votes
- Each node votes for at most one candidate per term
- Two candidates cannot both get majority votes (mathematical impossibility)
Example (5-node cluster):
Candidate A gets votes from: Node 1, 2, 3 (majority ✅)
Candidate B can only get: Node 4, 5 (not majority ❌)
Impossible for both A and B to get 3+ votes!
Network Partition Scenarios¶
Scenario 1: Minority Partition¶
Setup: 3-node cluster (Node 1, 2, 3), Node 1 is leader
Event: Node 3 is isolated
Before:
[Node 1 (Leader)] ←→ [Node 2] ←→ [Node 3]
After Partition:
[Node 1 (Leader)] ←→ [Node 2] | [Node 3 (isolated)]
Result:
- Node 1 remains leader (still has majority with Node 2)
- Cluster continues operating normally
- Node 3 becomes follower, cannot accept writes
- No split-brain - only one leader
Scenario 2: Equal Partition (3 nodes)¶
Setup: 3-node cluster, network splits 1-2 vs 3
Event: Network partition
Result:
- Partition with nodes 1, 2 keeps leader (has majority)
- Node 3 cannot elect new leader (no majority)
- No split-brain - only one leader
Scenario 3: Leader in Minority¶
Setup: 5-node cluster, Node 1 is leader
Event: Node 1 and 2 isolated from 3, 4, 5
Before:
[Node 1 (Leader)] ←→ [Node 2] ←→ [Node 3] ←→ [Node 4] ←→ [Node 5]
After Partition:
[Node 1] ←→ [Node 2] | [Node 3] ←→ [Node 4] ←→ [Node 5]
Result:
- Node 1 loses leadership (no majority - only 2 of 5)
- Nodes 3, 4, 5 elect new leader (have majority - 3 of 5)
- Node 1 steps down after election timeout
- No split-brain - old leader steps down, new leader elected
Verification¶
You can verify split-brain protection yourself:
Test 1: Isolate Minority¶
# 3-node cluster
# Block Node 3's network
# On Node 1 or 2 (majority partition):
psql -c "SELECT pgraft_is_leader();" # One returns true
psql -c "SELECT pgraft_add_node(4, '127.0.0.1', 7004);" # Works!
# On Node 3 (minority):
psql -c "SELECT pgraft_is_leader();" # Returns false
psql -c "SELECT pgraft_add_node(4, '127.0.0.1', 7004);" # Fails!
Test 2: Leader in Minority¶
# 5-node cluster, Node 1 is leader
# Isolate Node 1 and 2
# Majority partition (3, 4, 5) will elect new leader after ~1 second
# Minority partition (1, 2) cannot elect leader
# When network heals, Node 1 sees higher term and steps down
Mathematical Proof Sketch¶
Theorem: At most one leader per term.
Proof:
- Leader requires majority votes: N/2 + 1
- Each node votes once per term
- Two majorities must overlap (Pigeonhole Principle)
- Overlapping node cannot vote for both candidates
- Therefore, at most one candidate can get majority
- QED: At most one leader per term
Comparison with Other Systems¶
System | Split-Brain Protection | Method |
---|---|---|
pgraft | 100% | Raft consensus |
PostgreSQL streaming replication | No | Manual failover |
MySQL replication | No | Manual failover |
Patroni/Stolon | Partial | Requires external consensus (etcd/Zookeeper) |
PostgreSQL with Pacemaker | Partial | STONITH fencing |
pgraft Advantage
pgraft provides split-brain protection natively without requiring external consensus systems or STONITH devices.
Best Practices¶
1. Use Odd Number of Nodes¶
Odd numbers provide better fault tolerance:
- 3 nodes: Tolerates 1 failure
- 5 nodes: Tolerates 2 failures
- 7 nodes: Tolerates 3 failures
Even numbers waste resources:
- 4 nodes: Still tolerates only 1 failure (same as 3)
- 6 nodes: Still tolerates only 2 failures (same as 5)
2. Geographic Distribution¶
For disaster recovery, distribute nodes across:
- Different availability zones
- Different data centers
- Different geographic regions
Example (5-node cluster):
Region A (2 nodes): Primary data center
Region B (2 nodes): Secondary data center
Region C (1 node): Tiebreaker
3. Monitor Term Changes¶
Frequent term changes indicate problems:
-- Monitor term changes
SELECT pgraft_get_term();
-- If term increases rapidly:
-- - Network instability
-- - Node failures
-- - Election timeout too low
Summary¶
pgraft provides guaranteed split-brain protection through:
- Quorum-based elections - Only majority can elect leader
- Term monotonicity - Higher term always wins
- Log completeness - Only up-to-date nodes elected
- Mathematical guarantees - Proven by Raft algorithm
You can trust pgraft to never allow split-brain scenarios.