Consistency Models: Deep Dive for Interviews¶

A practical reference to reason about data correctness, latency, and availability across distributed systems and databases. Use this to structure tradeoffs and avoid common pitfalls.

Contents: - 1. Why Consistency Matters - 2. CAP and PACELC - 3. Consistency Models (with examples) - 4. Replication, Quorums, and Tunable Consistency - 5. Database Isolation Levels (OLTP) - 6. Conflict Detection and Resolution - 7. Idempotency, Fencing, and Exactly-Once Illusion - 8. Eventual Consistency Design Patterns - 9. Global Consistency (TrueTime, Global Clocks) - 10. Testing and Validation - 11. Interview Playbook - 12. Sample Questions

1) Why Consistency Matters¶

Correctness: Users expect up-to-date views (read-your-writes) and durable outcomes (no double charges).
Latency/Availability: Stronger guarantees often cost more latency and reduce availability under partitions.
Cost and Complexity: Relaxed models can scale better but push complexity into application logic (retries, reconciliation).

2) CAP and PACELC¶

CAP theorem: Under a partition (P), choose consistency (C) or availability (A).
CP systems prefer consistent responses or errors/unavailability during partitions.
AP systems prefer serving responses even if not the latest.
PACELC (extended):
If Partition (P), choose A or C; Else (E) when normal operation, choose Latency (L) or Consistency (C).
Many systems choose AP/EL (availability under partition, low latency otherwise).

3) Consistency Models (with examples)¶

Linearizability (strong consistency)
Each operation appears to take effect atomically at a single point in time.
If write(X=1) completes before read(X), read must see 1 globally.
Example: Single-leader DB with synchronous replication and quorum reads.
Serializability (transactional)
Transactions appear as if executed in some serial order (not necessarily real-time order).
Stronger than linearizable single operations; applies to multi-op transactions.
Sequential consistency
All operations appear in the same order across processes, respecting per-process order; not tied to real time.
Causal consistency
If A happens-before B, everyone agrees on A before B; concurrent ops may be seen in different orders.
Practical for social feeds, collaborative apps.
Read-your-writes (RYW)
A client sees its own writes subsequently; can be provided via session stickiness.
Monotonic reads/writes
Reads: once you see a value, you don’t see older values later.
Writes: a client’s writes are serialized in order.
Eventual consistency
In absence of new writes, replicas converge; no guarantee on staleness windows.

Notes: - Linearizability ≠ Serializability: linearizability is per operation; serializability is per transaction schedule. - Client-centric consistency (RYW, monotonic) can be layered over eventual stores with sessions.

4) Replication, Quorums, and Tunable Consistency¶

Replication factor N (number of replicas for a partition).
Quorum rules:
Write quorum W, read quorum R; to get linearizable reads: R + W > N, and W > N/2.
Example: N=3 → typical W=2, R=2 (R+W=4>3).
Leader-based vs leaderless:
Leader-based: simpler invariants (single writer), but leader failover adds complexity.
Leaderless (Dynamo-style): writes and reads go to multiple replicas; conflicts resolved on read/repair.
Hinted handoff, read repair, anti-entropy:
Mechanisms to handle temporarily down replicas and eventual convergence.

Tuning examples: - Fast reads: N=3, W=2, R=1 → AP-leaning, risk stale reads. - Stronger reads: N=3, W=2, R=2 → CP-leaning for reads, higher latency.

5) Database Isolation Levels (OLTP)¶

Read Uncommitted: dirty reads possible.
Read Committed: no dirty reads; non-repeatable reads and phantoms possible.
Repeatable Read: no dirty/non-repeatable reads; phantoms may occur (depends on DB).
Serializable: appears as if transactions ran one-by-one (may be via locking or SSI).

Common DB notes: - PostgreSQL: Repeatable Read uses MVCC (no non-repeatable reads; phantom avoidance differs); Serializable Snapshot Isolation (SSI) prevents anomalies with predicate locks. - MySQL/InnoDB: Repeatable Read often prevents phantom reads with next-key locking, but semantics vary with settings. - Spanner: External consistency (global serializability) via TrueTime.

Interview angle: - Map app requirements to minimal isolation level (e.g., payments → serializable; analytics → read committed).

6) Conflict Detection and Resolution¶

Detection methods:
Last-Write-Wins (LWW) via timestamps (danger with clock skew).
Vector clocks (detect concurrency, require merge policy).
Application-level constraints (e.g., per-key monotonic counters).
Conflict-free Replicated Data Types (CRDTs):
Commutative/associative structures that converge (G-Counter, PN-Counter, OR-Set, LWW-Register with caveats).
Merge strategies:
Commutative operations (additive counters).
Business-rule merges (e.g., cart: union of items, latest quantity per SKU).
Human-in-the-loop for irreconcilable conflicts.

7) Idempotency, Fencing, and Exactly-Once Illusion¶

Exactly-once delivery is not guaranteed end-to-end across unreliable networks.
Idempotency:
Use idempotency keys (request_id) so retries have no side effects.
Store processed IDs (inbox table) to dedupe.
Outbox pattern:
Write DB row and outbox in same transaction; a relay publishes events to the log/queue.
Fencing tokens:
For leader/lock scenarios, issue monotonically increasing tokens; reject stale writers (protects against “split brain”).
Two-phase commit (2PC)
Consistent across participants but fragile under coordinator failure; high latency; often avoided in microservices. Prefer Sagas.

8) Eventual Consistency Design Patterns¶

Read-your-writes via sessions:
Sticky sessions to a replica or read-after-write with per-client routing.
Write-through vs write-behind caches:
Write-through for stronger consistency; write-behind for lower latency (risk of loss).
Versioned keys:
Include version in key to force readers to fetch latest; clean up old versions async.
Compensating transactions (Sagas):
For business workflows across services; define compensations on failure.
De-duplication and Ordering:
Per-aggregate ordering via partition keys; application-level de-dup tables.

Pitfalls: - Cache incoherence: ensure invalidation or TTLs; add jitter to avoid thundering herds. - Clock skew: avoid LWW based purely on wall time; prefer logical clocks or fencing.

9) Global Consistency (TrueTime, Global Clocks)¶

Google Spanner:
TrueTime API exposes bounded clock uncertainty [ε]. Transactions wait out uncertainty (commit-wait) to achieve external consistency.
Hybrid Logical Clocks (HLC):
Combine physical and logical clocks; preserve causality with minimal skew sensitivity.
Global transactions:
High-latency due to geo round-trips; consider per-region writes with asynchronous replication unless strict global order is required.

10) Testing and Validation¶

Jepsen-style tests:
Inject partitions, clock skew, crashes; verify invariants (no lost updates, monotonicity).
Property-based tests:
Define invariants and generate random event sequences to detect anomalies.
Shadow traffic and canaries:
Validate consistency changes before full rollout.

11) Interview Playbook¶

Step 1: Classify the workload
Per-key invariants? Cross-entity transactions? SLA (p95/p99)? Multi-region?
Step 2: Choose model minimally sufficient
Cart/checkout: RYW + per-user session consistency may suffice.
Financial transfers: serializable or per-ledger linearizable writes with idempotent operations.
Step 3: Sketch replication and quorum
RF=3; W=2, R=2 for strong reads; mention failure behavior.
Step 4: Handle retries and duplicates
Idempotency keys; outbox/inbox; dedupe tables.
Step 5: Operational plan
Monitoring for staleness, lag, replica health; backpressure policies.

12) Sample Questions¶

Explain linearizability vs serializability with examples.
For RF=5, what (R, W) provide linearizable reads? Tradeoffs of (R=3, W=3) vs (R=2, W=4).
How to provide read-your-writes in a globally distributed app with CDNs and caches?
Design idempotent payment processing with retries and “exactly-once” claim.
Choose between CRDTs and application-level merges for collaborative editing.
How does Spanner achieve external consistency? What is commit-wait and TrueTime ε?