Distributed Systems for Platform Engineers
Master distributed systems concepts essential for platform engineering interviews.
π― Interview Focus Areasβ
Most Asked Topicsβ
- CAP Theorem - Trade-offs and real-world applications
- Consistency Models - Strong, eventual, weak consistency
- Consensus Algorithms - Raft, Paxos, Byzantine fault tolerance
- Distributed Transactions - 2PC, saga patterns
- Failure Handling - Fault tolerance, circuit breakers
Core Conceptsβ
CAP Theoremβ
The CAP theorem states that a distributed system can only guarantee two of:
- Consistency: All nodes see the same data simultaneously
- Availability: System remains operational
- Partition Tolerance: System continues during network failures
During Network Partition:
βββββββββββββββ β βββββββββββββββ
β Node A β βββββββββΊ β Node B β
β (Updated) β β (Not Updated)β
βββββββββββββββ βββββββββββββββ
Choose:
- CP: Refuse writes until partition heals (Consistency)
- AP: Accept writes, reconcile later (Availability)
Real-World CAP Examplesβ
System | Choice | Trade-off |
---|---|---|
Banking | CP | Consistency over availability |
Social Media | AP | Available but may show stale data |
Zookeeper | CP | Coordination requires consistency |
Cassandra | AP | Tunable consistency |
DynamoDB | AP | Eventually consistent by default |
Consistency Modelsβ
Strong Consistencyβ
All reads return the most recent write.
# Example: Bank transfer must be strongly consistent
def transfer_money(from_account, to_account, amount):
with distributed_transaction():
from_balance = read_balance(from_account)
if from_balance >= amount:
write_balance(from_account, from_balance - amount)
to_balance = read_balance(to_account)
write_balance(to_account, to_balance + amount)
commit()
else:
abort()
Eventual Consistencyβ
System will become consistent over time.
# Example: Social media likes can be eventually consistent
def increment_likes(post_id):
# Write to local replica
local_db.increment(f"likes:{post_id}")
# Async replicate to other regions
async_replicate({
'operation': 'increment',
'key': f"likes:{post_id}",
'timestamp': time.now()
})
Causal Consistencyβ
Operations that are causally related are seen in the same order.
User A posts β User B comments β User C likes comment
Everyone must see these in this order
Consensus Algorithmsβ
Raft Consensusβ
Raft ensures distributed systems agree on values through leader election.
# Simplified Raft leader election
class RaftNode:
def __init__(self, node_id):
self.state = "FOLLOWER"
self.current_term = 0
self.voted_for = None
self.log = []
def start_election(self):
self.state = "CANDIDATE"
self.current_term += 1
self.voted_for = self.node_id
votes = 1 # Vote for self
for node in other_nodes:
if node.request_vote(self.current_term, self.node_id):
votes += 1
if votes > len(all_nodes) / 2:
self.state = "LEADER"
self.send_heartbeats()
Two-Phase Commit (2PC)β
Ensures all nodes commit or all abort.
Coordinator Participants
β β
βββββ PREPARE ββββββββββββββΊβ
β β
ββββββ VOTE (YES/NO) ββββββββ€
β β
βββββ COMMIT/ABORT βββββββββΊβ
β β
ββββββ ACK ββββββββββββββββββ€
Problems with 2PC:
- Blocking if coordinator fails
- No fault tolerance
- Performance overhead
Saga Patternβ
Alternative to 2PC for long-running transactions.
# Order processing saga
class OrderSaga:
def __init__(self):
self.steps = [
(self.reserve_inventory, self.cancel_inventory),
(self.charge_payment, self.refund_payment),
(self.create_shipment, self.cancel_shipment)
]
def execute(self, order):
completed_steps = []
try:
for step, compensate in self.steps:
result = step(order)
completed_steps.append((compensate, result))
except Exception as e:
# Compensate in reverse order
for compensate, result in reversed(completed_steps):
compensate(result)
raise
Distributed System Patternsβ
Circuit Breakerβ
Prevents cascading failures.
class CircuitBreaker:
def __init__(self, failure_threshold=5, timeout=60):
self.failure_threshold = failure_threshold
self.timeout = timeout
self.failures = 0
self.last_failure_time = None
self.state = 'CLOSED'
def call(self, func, *args, **kwargs):
if self.state == 'OPEN':
if time.time() - self.last_failure_time > self.timeout:
self.state = 'HALF_OPEN'
else:
raise Exception("Circuit breaker is OPEN")
try:
result = func(*args, **kwargs)
if self.state == 'HALF_OPEN':
self.state = 'CLOSED'
self.failures = 0
return result
except Exception as e:
self.failures += 1
self.last_failure_time = time.time()
if self.failures >= self.failure_threshold:
self.state = 'OPEN'
raise
Bulkhead Patternβ
Isolate resources to prevent total failure.
# Thread pool bulkhead
from concurrent.futures import ThreadPoolExecutor
class BulkheadExecutor:
def __init__(self):
self.executors = {
'critical': ThreadPoolExecutor(max_workers=10),
'normal': ThreadPoolExecutor(max_workers=5),
'batch': ThreadPoolExecutor(max_workers=2)
}
def submit(self, priority, func, *args):
return self.executors[priority].submit(func, *args)
Retry with Backoffβ
Handle transient failures gracefully.
def exponential_backoff_retry(func, max_retries=3, base_delay=1):
for attempt in range(max_retries):
try:
return func()
except Exception as e:
if attempt == max_retries - 1:
raise
delay = base_delay * (2 ** attempt) + random.uniform(0, 1)
time.sleep(delay)
Clock Synchronizationβ
Logical Clocks (Lamport Timestamps)β
Order events without synchronized physical clocks.
class LamportClock:
def __init__(self):
self.time = 0
def increment(self):
self.time += 1
return self.time
def update(self, received_time):
self.time = max(self.time, received_time) + 1
return self.time
Vector Clocksβ
Track causality between events.
class VectorClock:
def __init__(self, node_id, num_nodes):
self.node_id = node_id
self.clock = [0] * num_nodes
def increment(self):
self.clock[self.node_id] += 1
return self.clock.copy()
def update(self, other_clock):
for i in range(len(self.clock)):
self.clock[i] = max(self.clock[i], other_clock[i])
self.increment()
Distributed Storageβ
Consistent Hashingβ
Distribute data across nodes with minimal reshuffling.
import hashlib
class ConsistentHash:
def __init__(self, nodes, virtual_nodes=150):
self.nodes = nodes
self.virtual_nodes = virtual_nodes
self.ring = {}
self._build_ring()
def _hash(self, key):
return int(hashlib.md5(key.encode()).hexdigest(), 16)
def _build_ring(self):
for node in self.nodes:
for i in range(self.virtual_nodes):
virtual_key = f"{node}:{i}"
hash_value = self._hash(virtual_key)
self.ring[hash_value] = node
def get_node(self, key):
if not self.ring:
return None
hash_value = self._hash(key)
# Find the first node clockwise from the key
for node_hash in sorted(self.ring.keys()):
if node_hash >= hash_value:
return self.ring[node_hash]
# Wrap around
return self.ring[min(self.ring.keys())]
Replication Strategiesβ
Master-Slave Replication
ββββββββββ
β Master β βββββββΊ Write
βββββ¬βββββ
β Replicate
βββββΌβββββ ββββββββββ ββββββββββ
β Slave1 β β Slave2 β β Slave3 β βββΊ Read
ββββββββββ ββββββββββ ββββββββββ
Multi-Master Replication
ββββββββββ ββββΊ ββββββββββ
βMaster1 β βMaster2 β
ββββββββββ ββββΊ ββββββββββ
β² β²
ββββββββββββββββ
Conflict Resolution
Common Interview Questionsβ
Q1: "Design a distributed cache"β
class DistributedCache:
def __init__(self, nodes):
self.consistent_hash = ConsistentHash(nodes)
self.local_cache = {}
self.replication_factor = 3
def get(self, key):
# Try local cache first
if key in self.local_cache:
return self.local_cache[key]
# Find responsible nodes
nodes = self.get_nodes_for_key(key)
# Try each replica
for node in nodes:
try:
value = node.get(key)
if value:
self.local_cache[key] = value
return value
except:
continue
return None
def put(self, key, value):
nodes = self.get_nodes_for_key(key)
# Write to all replicas
successful_writes = 0
for node in nodes:
try:
node.put(key, value)
successful_writes += 1
except:
pass
# Require quorum
if successful_writes < len(nodes) // 2 + 1:
raise Exception("Failed to achieve write quorum")
Q2: "How do you handle split-brain in distributed systems?"β
Solutions:
- Quorum-based decisions: Require majority for operations
- Fencing tokens: Monotonically increasing tokens
- STONITH (Shoot The Other Node In The Head)
- External arbitrator: Zookeeper, etcd
- Witness nodes: Lightweight tie-breakers
Q3: "Explain distributed tracing"β
class DistributedTrace:
def __init__(self, trace_id=None):
self.trace_id = trace_id or generate_id()
self.spans = []
def start_span(self, operation):
span = {
'span_id': generate_id(),
'trace_id': self.trace_id,
'operation': operation,
'start_time': time.time(),
'parent_id': self.current_span_id if hasattr(self, 'current_span_id') else None
}
self.spans.append(span)
self.current_span_id = span['span_id']
return span
def end_span(self, span):
span['end_time'] = time.time()
span['duration'] = span['end_time'] - span['start_time']
Best Practicesβ
1. Design for Failureβ
- Assume nodes will fail
- Assume network will partition
- Build in redundancy
- Test failure scenarios
2. Monitoring & Observabilityβ
- Distributed tracing
- Centralized logging
- Metrics aggregation
- Health checks
3. Gradual Rolloutsβ
- Canary deployments
- Feature flags
- Blue-green deployments
- Rollback capabilities
Resourcesβ
- π Designing Data-Intensive Applications
- π Distributed Systems: Principles and Paradigms
- π The Google File System
- π₯ MIT 6.824: Distributed Systems
Interview tip: When discussing distributed systems, always acknowledge trade-offs. There's no perfect solution - explain why you chose specific approaches for given requirements.