CAP Theorem

What is CAP Theorem?

CAP Theorem states that a distributed system can only guarantee two out of three properties simultaneously: Consistency, Availability, and Partition Tolerance. Understanding CAP is crucial for designing distributed systems.

The Three Properties

Consistency (C)

Definition: All nodes see the same data at the same time. Every read receives the most recent write.

Example:

User updates profile on Server A
Immediately reads from Server B
Sees the updated profile (consistent)

Strong Consistency: Read always returns latest write Eventual Consistency: Reads may return stale data temporarily

Availability (A)

Definition: Every request receives a response (success or failure), without guarantee that it contains the most recent write.

Example:

Server A is down
Request goes to Server B
Server B responds (even if data might be slightly stale)

Characteristics:

System remains operational
No timeouts or errors
May return stale data

Partition Tolerance (P)

Definition: System continues to operate despite network partitions (communication breakdown between nodes).

Example:

Network split separates Server A and Server B
Both servers continue accepting requests
System doesn’t fail completely

Reality: Network partitions WILL happen in distributed systems, so P is mandatory.

CAP Combinations

Since network partitions are inevitable, you must choose between C and A when partition occurs.

CP Systems (Consistency + Partition Tolerance)

Trade-off: Sacrifice availability for consistency

Behavior during partition:

Reject requests that cannot be guaranteed consistent
Return errors or timeouts
Wait for partition to heal

Examples:

MongoDB (with majority write concern)
HBase
Redis (with synchronous replication)
Banking systems

Use Cases:

Financial transactions
Inventory management
Systems where stale data is unacceptable

Real-World Example:

Bank Account Balance: $1000

Partition occurs between data centers:
- User tries to withdraw $500
- System cannot confirm balance across all nodes
- Transaction REJECTED (maintains consistency)
- User sees error message (sacrifices availability)

AP Systems (Availability + Partition Tolerance)

Trade-off: Sacrifice consistency for availability

Behavior during partition:

Accept all requests
Return potentially stale data
Resolve conflicts later

Examples:

Cassandra
DynamoDB
Riak
DNS

Use Cases:

Social media feeds
Shopping carts
Content delivery
Systems where availability is critical

Real-World Example:

Social Media Post Likes: 100

Partition occurs:
- User A sees 100 likes (from Server A)
- User B sees 105 likes (from Server B)
- Both can still like the post (available)
- Counts eventually converge (eventual consistency)

CA Systems (Consistency + Availability)

Reality: Not possible in distributed systems with network partitions

Why: Network partitions are inevitable, so you MUST have partition tolerance

Note: Single-server databases (traditional RDBMS on one machine) can be CA because there’s no distribution, but they’re not truly distributed systems.

CAP in Practice

MongoDB (CP)

Default Behavior:

Write to primary, replicate to secondaries
With writeConcern: majority, waits for majority acknowledgment
During partition, minority nodes reject writes

Configuration:

// Strong consistency (CP)
await collection.insertOne(document, {
  writeConcern: { w: 'majority' },
  readPreference: 'primary'
});

// If partition occurs and can't reach majority:
// MongoError: not master and slaveOk=false

Cassandra (AP)

Default Behavior:

Accepts writes even during partitions
Uses eventual consistency
Resolves conflicts with timestamps (last write wins)

Configuration:

// Eventual consistency (AP)
await client.execute(query, params, {
  consistency: cassandra.types.consistencies.one
});

// During partition:
// - Writes succeed on available nodes
// - Reads may return stale data
// - No errors, system remains available

DynamoDB (AP with tunable consistency)

Flexibility:

Default: Eventually consistent reads (AP)
Optional: Strongly consistent reads (CP-like)

Configuration:

// Eventually consistent (AP)
const params = {
  TableName: 'Users',
  Key: { userId: '123' },
  ConsistentRead: false // Default
};

// Strongly consistent (CP-like)
const params = {
  TableName: 'Users',
  Key: { userId: '123' },
  ConsistentRead: true
};

Beyond CAP: PACELC

Extension of CAP: PACELC adds consideration for normal operation (no partition)

PACELC Theorem:

Partition: Choose between Availability and Consistency
Else (no partition): Choose between Latency and Consistency

Examples:

PA/EL Systems: Prioritize availability and low latency

Cassandra, DynamoDB
Fast responses, eventual consistency

PC/EC Systems: Prioritize consistency always

MongoDB, HBase
Consistent data, higher latency

PA/EC Systems: Available during partition, consistent otherwise

Some configurations of distributed databases

Handling Network Partitions

Detecting Partitions

Methods:

Heartbeat timeouts
Failed health checks
Gossip protocol failures
Consensus algorithm failures

Strategies During Partition

Quorum-Based:

Require majority of nodes to agree
Minority partition rejects operations
Maintains consistency

Last Write Wins:

Use timestamps to resolve conflicts
Simple but can lose data
Maintains availability

Vector Clocks:

Track causality of updates
Detect concurrent modifications
Require conflict resolution

CRDTs (Conflict-Free Replicated Data Types):

Data structures that automatically merge
No conflicts by design
Complex to implement

Choosing Between CP and AP

Choose CP When:

Data correctness is critical
Inconsistent data causes problems
Can tolerate downtime
Examples: Banking, inventory, booking systems

Choose AP When:

Availability is critical
Temporary inconsistency acceptable
Cannot tolerate downtime
Examples: Social media, analytics, content delivery

Questions to Ask:

What happens if user sees stale data?
Can we afford to reject requests?
How long can inconsistency last?
What’s the cost of downtime?
How do we resolve conflicts?

.NET Example: Handling CAP Trade-offs

public class DistributedDataService
{
    private readonly IMongoDatabase _primaryDb;
    private readonly IMongoDatabase _secondaryDb;
    
    // CP approach: Strong consistency, may fail during partition
    public async Task<User> GetUserConsistent(string userId)
    {
        try
        {
            // Read from primary only
            var collection = _primaryDb.GetCollection<User>("users")
                .WithReadPreference(ReadPreference.Primary);
            
            return await collection.Find(u => u.Id == userId)
                .FirstOrDefaultAsync();
        }
        catch (MongoException ex)
        {
            // During partition, this may fail
            throw new ServiceUnavailableException(
                "Cannot guarantee consistent data", ex);
        }
    }
    
    // AP approach: High availability, may return stale data
    public async Task<User> GetUserAvailable(string userId)
    {
        try
        {
            // Try primary first
            var collection = _primaryDb.GetCollection<User>("users")
                .WithReadPreference(ReadPreference.PrimaryPreferred);
            
            return await collection.Find(u => u.Id == userId)
                .FirstOrDefaultAsync();
        }
        catch (MongoException)
        {
            // Fallback to secondary (may be stale)
            var collection = _secondaryDb.GetCollection<User>("users")
                .WithReadPreference(ReadPreference.Secondary);
            
            return await collection.Find(u => u.Id == userId)
                .FirstOrDefaultAsync();
        }
    }
}

Real-World Examples

Amazon Shopping Cart (AP)

Scenario: User adds items to cart from different devices

Approach:

Always accept additions (availability)
Merge carts later (eventual consistency)
Never lose items (conflict resolution: union)

Why AP: Better to show extra items than lose items or show errors

Bank Account Balance (CP)

Scenario: User checks balance and withdraws money

Approach:

Reject transactions if cannot confirm balance (consistency)
Show error during network issues (sacrifice availability)
Never allow overdraft

Why CP: Incorrect balance could cause overdraft, legal issues

Scenario: Users like posts from around the world

Approach:

Always accept likes (availability)
Count may be temporarily inconsistent
Eventually converge to correct count

Why AP: Exact like count not critical, availability is

Best Practices

Understand your requirements - CP or AP based on business needs
Design for partition tolerance - Partitions will happen
Implement proper monitoring - Detect partitions quickly
Test partition scenarios - Chaos engineering
Document trade-offs - Make conscious decisions
Use appropriate consistency levels - Tune per operation
Implement conflict resolution - For AP systems
Plan for recovery - Partition healing procedures

Interview Tips

Explain CAP clearly: Can only have 2 of 3
Show real examples: CP (banking) vs AP (social media)
Discuss partition tolerance: Why it’s mandatory
Demonstrate trade-offs: Consistency vs availability
Mention PACELC: Extension for normal operation
Show practical choices: When to choose CP or AP

Summary

CAP Theorem states distributed systems can only guarantee two of Consistency, Availability, and Partition Tolerance. Since network partitions are inevitable, choose between C and A. CP systems (MongoDB, HBase) sacrifice availability for consistency - reject requests during partitions. AP systems (Cassandra, DynamoDB) sacrifice consistency for availability - accept requests with eventual consistency. Choose CP for critical data (banking, inventory). Choose AP for high availability needs (social media, content). PACELC extends CAP to consider latency during normal operation. Essential for understanding distributed system trade-offs.

Test Your Knowledge

Take a quick quiz to test your understanding of this topic.

Search

Search Coming Soon

CAP Theorem

What is CAP Theorem?

The Three Properties

Consistency (C)

Availability (A)

Partition Tolerance (P)

CAP Combinations

CP Systems (Consistency + Partition Tolerance)

AP Systems (Availability + Partition Tolerance)

CA Systems (Consistency + Availability)

CAP in Practice

MongoDB (CP)

Cassandra (AP)

DynamoDB (AP with tunable consistency)

Beyond CAP: PACELC

Handling Network Partitions

Detecting Partitions

Strategies During Partition

Choosing Between CP and AP

Choose CP When:

Choose AP When:

Questions to Ask:

.NET Example: Handling CAP Trade-offs

Real-World Examples

Amazon Shopping Cart (AP)

Bank Account Balance (CP)

Best Practices

Interview Tips

Summary

Test Your Knowledge

Test Your System-design Knowledge

Search

Search Coming Soon

CAP Theorem

What is CAP Theorem?

The Three Properties

Consistency (C)

Availability (A)

Partition Tolerance (P)

CAP Combinations

CP Systems (Consistency + Partition Tolerance)

AP Systems (Availability + Partition Tolerance)

CA Systems (Consistency + Availability)

CAP in Practice

MongoDB (CP)

Cassandra (AP)

DynamoDB (AP with tunable consistency)

Beyond CAP: PACELC

Handling Network Partitions

Detecting Partitions

Strategies During Partition

Choosing Between CP and AP

Choose CP When:

Choose AP When:

Questions to Ask:

.NET Example: Handling CAP Trade-offs

Real-World Examples

Amazon Shopping Cart (AP)

Bank Account Balance (CP)

Social Media Likes (AP)

Best Practices

Interview Tips

Summary

Test Your Knowledge

Test Your System-design Knowledge