Data Replication

What is Data Replication?

Data replication is the process of copying and maintaining database objects in multiple database servers. It ensures data availability, improves read performance, and provides disaster recovery capabilities.

Why Replicate Data?

High Availability: If primary server fails, replica takes over

Read Scalability: Distribute read queries across multiple replicas

Disaster Recovery: Geographic replicas protect against regional failures

Reduced Latency: Serve users from nearest replica

Backup: Replicas can be used for backups without impacting primary

Replication Strategies

1. Master-Slave (Primary-Replica)

Architecture: One primary server handles writes, multiple replicas handle reads

How it works:

All writes go to primary
Primary replicates changes to replicas
Reads distributed across replicas
Replicas are read-only

Advantages:

Simple to implement
Good for read-heavy workloads
Clear write path

Disadvantages:

Single point of failure for writes
Replication lag (eventual consistency)
Primary can become bottleneck

Use Cases:

Content management systems
Reporting databases
Analytics workloads

2. Master-Master (Multi-Primary)

Architecture: Multiple servers accept writes, replicate to each other

How it works:

Any server can handle writes
Changes replicated to all other servers
Conflict resolution required

Advantages:

No single point of failure for writes
Better write scalability
Geographic distribution

Disadvantages:

Complex conflict resolution
Potential data inconsistencies
More difficult to implement

Use Cases:

Global applications
High availability requirements
Collaborative applications

3. Peer-to-Peer

Architecture: All nodes are equal, each can read and write

How it works:

Nodes replicate changes to each other
No designated primary
Eventual consistency

Advantages:

Highly available
No single point of failure
Scales horizontally

Disadvantages:

Complex consistency management
Conflict resolution challenges
Network overhead

Use Cases:

Distributed databases (Cassandra, DynamoDB)
Blockchain systems

Replication Methods

Synchronous Replication

How it works: Primary waits for replica acknowledgment before confirming write

Characteristics:

Strong consistency
Higher latency (wait for replicas)
Guaranteed data on replicas

Example Flow:

Client writes to primary
Primary sends to replicas
Replicas acknowledge
Primary confirms to client

Use Cases: Financial transactions, critical data

Asynchronous Replication

How it works: Primary confirms write immediately, replicates in background

Characteristics:

Lower latency
Eventual consistency
Risk of data loss if primary fails

Example Flow:

Client writes to primary
Primary confirms immediately
Primary replicates to replicas asynchronously

Use Cases: Social media, content delivery, analytics

Semi-Synchronous Replication

How it works: Wait for at least one replica, others asynchronous

Characteristics:

Balance between consistency and performance
Some data protection
Better latency than full synchronous

Use Cases: Most production databases

Replication Lag

Definition: Time delay between write on primary and availability on replica

Causes:

Network latency
Replica processing speed
High write volume
Large transactions

Impact:

Read-after-write inconsistency (user writes, immediately reads from replica, doesn’t see their write)
Stale data on replicas
User confusion

Solutions:

Read from primary for critical reads
Session consistency (same user reads from same replica)
Causal consistency (track dependencies)
Monitor lag and alert

MongoDB Replication Example

// MongoDB Replica Set configuration
rs.initiate({
  _id: "myReplicaSet",
  members: [
    { _id: 0, host: "mongo1:27017", priority: 2 }, // Primary
    { _id: 1, host: "mongo2:27017", priority: 1 }, // Secondary
    { _id: 2, host: "mongo3:27017", priority: 1 }  // Secondary
  ]
});

// Connection with replica set
const { MongoClient } = require('mongodb');

const client = new MongoClient('mongodb://mongo1:27017,mongo2:27017,mongo3:27017/mydb?replicaSet=myReplicaSet');

// Write to primary
await client.db('mydb').collection('users').insertOne({
  name: 'John',
  email: 'john@example.com'
});

// Read from secondary (eventual consistency)
const user = await client.db('mydb').collection('users')
  .findOne({ email: 'john@example.com' }, { readPreference: 'secondary' });

// Read from primary (strong consistency)
const userPrimary = await client.db('mydb').collection('users')
  .findOne({ email: 'john@example.com' }, { readPreference: 'primary' });

PostgreSQL Replication

-- Primary server configuration (postgresql.conf)
wal_level = replica
max_wal_senders = 3
wal_keep_size = 64

-- Create replication user
CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'password';

-- Replica server configuration
primary_conninfo = 'host=primary_host port=5432 user=replicator password=password'
hot_standby = on

-- Check replication status
SELECT * FROM pg_stat_replication;

-- Check replication lag
SELECT 
  now() - pg_last_xact_replay_timestamp() AS replication_lag;

Conflict Resolution

Problem: In multi-master replication, same data modified on different servers

Strategies:

Last Write Wins (LWW):

Use timestamp to determine winner
Simple but can lose data
Example: User A updates at 10:00, User B at 10:01 → B wins

Version Vectors:

Track version per node
Detect concurrent updates
Require manual resolution

Application-Level Resolution:

Application decides how to merge
Custom logic per use case
Example: Shopping cart merges items

Operational Transformation:

Transform operations to apply in any order
Used in collaborative editing
Complex to implement

Read Preferences

Primary: Always read from primary (strong consistency)

Primary Preferred: Read from primary, fallback to secondary if unavailable

Secondary: Always read from secondary (eventual consistency, best for read scaling)

Secondary Preferred: Read from secondary, fallback to primary

Nearest: Read from server with lowest latency

// MongoDB read preferences
const options = {
  readPreference: 'secondaryPreferred',
  maxStalenessSeconds: 120 // Don't read from replica more than 2 min behind
};

const users = await db.collection('users').find({}, options).toArray();

.NET Replication Example

using MongoDB.Driver;

public class ReplicationService
{
    private readonly IMongoClient _client;
    
    public ReplicationService()
    {
        // Connect to replica set
        var settings = MongoClientSettings.FromConnectionString(
            "mongodb://mongo1:27017,mongo2:27017,mongo3:27017/?replicaSet=myReplicaSet"
        );
        
        _client = new MongoClient(settings);
    }
    
    // Write to primary
    public async Task CreateUser(User user)
    {
        var database = _client.GetDatabase("mydb");
        var collection = database.GetCollection<User>("users");
        
        await collection.InsertOneAsync(user);
    }
    
    // Read from secondary
    public async Task<User> GetUser(string id)
    {
        var database = _client.GetDatabase("mydb");
        var collection = database.GetCollection<User>("users")
            .WithReadPreference(ReadPreference.SecondaryPreferred);
        
        return await collection.Find(u => u.Id == id).FirstOrDefaultAsync();
    }
    
    // Read from primary (strong consistency)
    public async Task<User> GetUserConsistent(string id)
    {
        var database = _client.GetDatabase("mydb");
        var collection = database.GetCollection<User>("users")
            .WithReadPreference(ReadPreference.Primary);
        
        return await collection.Find(u => u.Id == id).FirstOrDefaultAsync();
    }
}

Monitoring Replication

Key Metrics:

Replication lag (time behind primary)
Replica status (healthy, down, recovering)
Replication throughput
Network bandwidth usage
Disk space on replicas

Alerts:

Lag exceeds threshold (e.g., > 60 seconds)
Replica goes down
Replication stops
Disk space low

Failover

Automatic Failover: System detects primary failure, promotes replica automatically

Manual Failover: DBA manually promotes replica to primary

Failover Process:

Detect primary failure
Elect new primary (usually most up-to-date replica)
Promote replica to primary
Redirect writes to new primary
Reconfigure other replicas to follow new primary

Considerations:

Data loss risk (if async replication)
Downtime during failover
Split-brain scenario (two primaries)
Client reconnection

Best Practices

Use at least 3 replicas - Allows majority for elections
Monitor replication lag - Alert on excessive lag
Test failover regularly - Ensure process works
Use semi-synchronous replication - Balance consistency and performance
Geographic distribution - Replicas in different regions
Separate read and write connections - Route appropriately
Set read preferences carefully - Balance consistency needs and performance
Plan for split-brain - Use quorum-based systems
Monitor replica health - Automated health checks
Document failover procedures - Clear runbooks

Interview Tips

Explain replication purpose: Availability, scalability, disaster recovery
Show strategies: Master-slave vs master-master
Demonstrate methods: Synchronous vs asynchronous
Discuss replication lag: Causes and solutions
Mention conflict resolution: Multi-master challenges
Show failover: Automatic promotion of replicas

Summary

Data replication copies data across multiple servers for high availability and read scalability. Master-slave has one primary for writes, multiple replicas for reads. Master-master allows writes on multiple servers but requires conflict resolution. Synchronous replication ensures consistency but higher latency. Asynchronous replication is faster but eventual consistency. Monitor replication lag and set appropriate read preferences. Implement automatic failover for high availability. Use at least 3 replicas for quorum. Geographic distribution protects against regional failures. Essential for building highly available, scalable systems.

Test Your Knowledge

Take a quick quiz to test your understanding of this topic.

Search

Search Coming Soon

Data Replication

What is Data Replication?

Why Replicate Data?

Replication Strategies

1. Master-Slave (Primary-Replica)

2. Master-Master (Multi-Primary)

3. Peer-to-Peer

Replication Methods

Synchronous Replication

Asynchronous Replication

Semi-Synchronous Replication

Replication Lag

MongoDB Replication Example

PostgreSQL Replication

Conflict Resolution

Read Preferences

.NET Replication Example

Monitoring Replication

Failover

Best Practices

Interview Tips

Summary

Test Your Knowledge

Test Your System-design Knowledge