Distributed Caching

What is Distributed Caching?

Distributed caching spreads cached data across multiple servers, enabling horizontal scaling of cache capacity and providing high availability. Unlike single-server caching, distributed caches can handle massive datasets and high request volumes.

Why Distributed Caching?

Scalability: Single cache server has memory limits

High Availability: No single point of failure

Performance: Serve requests from multiple locations

Capacity: Combine memory of multiple servers

Geographic Distribution: Cache closer to users

Popular Distributed Cache Systems

Redis Cluster

Architecture: Automatic sharding across nodes with master-replica replication

Key Features:

16,384 hash slots distributed across nodes
Automatic failover
Client-side routing
No proxy needed

Use Cases: Session storage, real-time analytics, leaderboards

Memcached

Architecture: Simple key-value store, client handles sharding

Key Features:

Very fast, simple protocol
Multi-threaded
LRU eviction
No persistence

Use Cases: Database query caching, page caching, API response caching

Hazelcast

Architecture: In-memory data grid with distributed data structures

Key Features:

Embedded or client-server mode
Distributed maps, queues, locks
Near cache for frequently accessed data
WAN replication

Use Cases: Java applications, distributed computing, session clustering

Cache Sharding Strategies

Consistent Hashing

Purpose: Minimize data movement when adding/removing nodes

How it works:

Hash both keys and nodes onto ring
Key stored on first node clockwise
Adding node only affects adjacent keys

Benefits:

Only ~1/N keys move when adding node
Balanced distribution with virtual nodes

Modulo Hashing

Simple approach: server = hash(key) % server_count

Problem: Adding server remaps most keys

When to use: Fixed number of servers

Range-Based Sharding

Approach: Divide key space into ranges

Example:

Server 1: keys A-H
Server 2: keys I-P
Server 3: keys Q-Z

Challenge: Uneven distribution if keys not uniform

Cache Replication

Master-Replica

Setup: Each shard has master and replicas

Reads: Can read from replicas (eventual consistency)

Writes: Go to master, replicated to replicas

Failover: Replica promoted to master if master fails

// Redis Cluster with replicas
const Redis = require('ioredis');

const cluster = new Redis.Cluster([
  { host: 'node1', port: 6379 },
  { host: 'node2', port: 6379 },
  { host: 'node3', port: 6379 }
], {
  redisOptions: {
    password: 'password'
  },
  // Read from replicas
  scaleReads: 'slave'
});

// Write (goes to master)
await cluster.set('user:123', JSON.stringify(userData));

// Read (can come from replica)
const data = await cluster.get('user:123');

Multi-Master Replication

Setup: Multiple nodes accept writes

Challenge: Conflict resolution needed

Use case: Geographic distribution

Cache Invalidation Strategies

Time-Based (TTL)

Approach: Set expiration time on cache entries

Pros: Simple, automatic cleanup

Cons: May serve stale data until expiry

// Set with TTL
await cache.setEx('user:123', 300, userData); // 5 minutes

// Set with absolute expiration
await cache.expireAt('session:abc', Date.now() + 3600000);

Event-Based

Approach: Invalidate when data changes

Pros: Always fresh data

Cons: Requires event system

// When user updates profile
async function updateUserProfile(userId, updates) {
  // Update database
  await db.users.update(userId, updates);
  
  // Invalidate cache
  await cache.del(`user:${userId}`);
  
  // Publish event
  await eventBus.publish('user.updated', { userId });
}

// Other services can also invalidate
eventBus.subscribe('user.updated', async (event) => {
  await cache.del(`user:${event.userId}`);
});

Write-Through

Approach: Update cache when writing to database

Pros: Cache always current

Cons: Write latency increased

async function updateUser(userId, data) {
  // Write to database
  await db.users.update(userId, data);
  
  // Update cache
  await cache.set(`user:${userId}`, JSON.stringify(data));
}

Cache-Aside with Versioning

Approach: Include version in cache key

Pros: No stale data issues

Cons: More cache misses

const CACHE_VERSION = 'v2';

async function getUser(userId) {
  const key = `user:${userId}:${CACHE_VERSION}`;
  
  let user = await cache.get(key);
  
  if (!user) {
    user = await db.users.findById(userId);
    await cache.setEx(key, 300, JSON.stringify(user));
  }
  
  return JSON.parse(user);
}

// When schema changes, just increment version
// Old cache entries automatically become stale

Cache Stampede Prevention

Problem: Many requests for expired key hit database simultaneously

Solution: Lock-based approach

class CacheStampedePrevention {
  constructor(cache, db) {
    this.cache = cache;
    this.db = db;
    this.locks = new Map();
  }
  
  async get(key, loader) {
    // Try cache
    let value = await this.cache.get(key);
    if (value) return JSON.parse(value);
    
    // Check if another request is loading
    if (this.locks.has(key)) {
      return await this.locks.get(key);
    }
    
    // Create load promise
    const loadPromise = this.load(key, loader);
    this.locks.set(key, loadPromise);
    
    try {
      value = await loadPromise;
      return value;
    } finally {
      this.locks.delete(key);
    }
  }
  
  async load(key, loader) {
    const value = await loader();
    await this.cache.setEx(key, 300, JSON.stringify(value));
    return value;
  }
}

// Usage
const cache = new CacheStampedePrevention(redis, db);

// Multiple concurrent requests
const results = await Promise.all([
  cache.get('user:123', () => db.users.findById('123')),
  cache.get('user:123', () => db.users.findById('123')),
  cache.get('user:123', () => db.users.findById('123'))
]);
// Only one database query executed

Near Cache Pattern

Concept: Local cache in application server + distributed cache

Benefits:

Fastest possible reads (in-process)
Reduced network calls
Lower distributed cache load

Implementation:

class NearCache {
  constructor(distributedCache, maxSize = 1000) {
    this.distributedCache = distributedCache;
    this.localCache = new Map();
    this.maxSize = maxSize;
  }
  
  async get(key) {
    // Try local cache first
    if (this.localCache.has(key)) {
      return this.localCache.get(key);
    }
    
    // Try distributed cache
    const value = await this.distributedCache.get(key);
    
    if (value) {
      // Store in local cache
      this.setLocal(key, value);
    }
    
    return value;
  }
  
  async set(key, value, ttl) {
    // Update distributed cache
    await this.distributedCache.setEx(key, ttl, value);
    
    // Update local cache
    this.setLocal(key, value);
  }
  
  setLocal(key, value) {
    // Evict if at capacity
    if (this.localCache.size >= this.maxSize) {
      const firstKey = this.localCache.keys().next().value;
      this.localCache.delete(firstKey);
    }
    
    this.localCache.set(key, value);
  }
  
  async invalidate(key) {
    // Remove from both caches
    this.localCache.delete(key);
    await this.distributedCache.del(key);
  }
}

Cache Warming

Purpose: Pre-populate cache with frequently accessed data

Strategies:

class CacheWarmer {
  constructor(cache, db) {
    this.cache = cache;
    this.db = db;
  }
  
  // Warm on startup
  async warmOnStartup() {
    console.log('Warming cache...');
    
    // Load popular items
    const popular = await this.db.query(`
      SELECT id, data FROM items 
      ORDER BY access_count DESC 
      LIMIT 1000
    `);
    
    for (const item of popular) {
      await this.cache.setEx(
        `item:${item.id}`,
        3600,
        JSON.stringify(item.data)
      );
    }
    
    console.log(`Warmed ${popular.length} items`);
  }
  
  // Scheduled refresh
  startScheduledWarming() {
    setInterval(async () => {
      await this.warmOnStartup();
    }, 3600000); // Every hour
  }
  
  // Predictive warming
  async warmRelated(itemId) {
    // When user views item, warm related items
    const related = await this.db.query(`
      SELECT id, data FROM items 
      WHERE category = (SELECT category FROM items WHERE id = ?)
      LIMIT 10
    `, [itemId]);
    
    for (const item of related) {
      await this.cache.setEx(
        `item:${item.id}`,
        1800,
        JSON.stringify(item.data)
      );
    }
  }
}

Monitoring Distributed Cache

Key Metrics:

Hit Rate: hits / (hits + misses)

Target: > 80%
Low hit rate indicates poor caching strategy

Memory Usage: Current memory / max memory

Alert when > 80%
Plan for scaling

Evictions: Number of keys evicted

High evictions indicate insufficient memory

Latency: P50, P95, P99 response times

Should be < 1ms for local network

Network Bandwidth: Bytes in/out

Monitor for bottlenecks

class CacheMonitor {
  async getMetrics(cache) {
    const info = await cache.info();
    
    return {
      hitRate: info.keyspace_hits / (info.keyspace_hits + info.keyspace_misses),
      memoryUsage: info.used_memory / info.maxmemory,
      evictions: info.evicted_keys,
      connections: info.connected_clients,
      opsPerSecond: info.instantaneous_ops_per_sec,
      
      alerts: this.checkAlerts(info)
    };
  }
  
  checkAlerts(info) {
    const alerts = [];
    
    const hitRate = info.keyspace_hits / (info.keyspace_hits + info.keyspace_misses);
    if (hitRate < 0.8) {
      alerts.push({ level: 'warning', message: `Low hit rate: ${hitRate.toFixed(2)}` });
    }
    
    const memoryUsage = info.used_memory / info.maxmemory;
    if (memoryUsage > 0.9) {
      alerts.push({ level: 'critical', message: `High memory usage: ${(memoryUsage * 100).toFixed(1)}%` });
    }
    
    return alerts;
  }
}

.NET Distributed Cache

using Microsoft.Extensions.Caching.Distributed;
using StackExchange.Redis;

public class DistributedCacheService
{
    private readonly IDistributedCache _cache;
    private readonly IConnectionMultiplexer _redis;
    
    public DistributedCacheService(
        IDistributedCache cache,
        IConnectionMultiplexer redis)
    {
        _cache = cache;
        _redis = redis;
    }
    
    // Basic caching
    public async Task<T> GetOrCreateAsync<T>(
        string key,
        Func<Task<T>> factory,
        TimeSpan expiration)
    {
        var cached = await _cache.GetStringAsync(key);
        
        if (!string.IsNullOrEmpty(cached))
        {
            return JsonSerializer.Deserialize<T>(cached);
        }
        
        var value = await factory();
        
        await _cache.SetStringAsync(
            key,
            JsonSerializer.Serialize(value),
            new DistributedCacheEntryOptions
            {
                AbsoluteExpirationRelativeToNow = expiration
            }
        );
        
        return value;
    }
    
    // Batch operations
    public async Task<Dictionary<string, T>> GetManyAsync<T>(List<string> keys)
    {
        var db = _redis.GetDatabase();
        var tasks = keys.Select(key => db.StringGetAsync(key));
        var results = await Task.WhenAll(tasks);
        
        var dict = new Dictionary<string, T>();
        for (int i = 0; i < keys.Count; i++)
        {
            if (results[i].HasValue)
            {
                dict[keys[i]] = JsonSerializer.Deserialize<T>(results[i]);
            }
        }
        
        return dict;
    }
    
    // Invalidation with pattern
    public async Task InvalidatePatternAsync(string pattern)
    {
        var server = _redis.GetServer(_redis.GetEndPoints().First());
        var keys = server.Keys(pattern: pattern);
        
        var db = _redis.GetDatabase();
        foreach (var key in keys)
        {
            await db.KeyDeleteAsync(key);
        }
    }
}

Best Practices

Use appropriate TTL - Balance freshness and cache hits
Monitor hit rates - Optimize caching strategy
Implement cache warming - Pre-populate frequently accessed data
Handle cache failures gracefully - Fallback to database
Use compression for large values - Reduce memory usage
Implement proper serialization - Consistent data format
Set memory limits - Prevent out-of-memory
Use connection pooling - Efficient resource usage
Implement retry logic - Handle transient failures
Tag cache entries - Group-based invalidation

Interview Tips

Explain distributed caching: Scale beyond single server
Show sharding strategies: Consistent hashing, modulo
Demonstrate invalidation: TTL, event-based, write-through
Discuss cache stampede: Prevention with locking
Mention near cache: Local + distributed caching
Show monitoring: Hit rate, memory, latency

Summary

Distributed caching spreads data across multiple servers for scalability and availability. Redis Cluster provides automatic sharding and failover. Memcached offers simple, fast caching. Use consistent hashing to minimize data movement when scaling. Implement proper invalidation strategies: TTL for simplicity, event-based for freshness. Prevent cache stampede with locking. Use near cache for fastest reads. Monitor hit rates (target > 80%), memory usage, and latency. Implement cache warming for frequently accessed data. Handle failures gracefully with fallback to database. Essential for building high-performance, scalable systems.

Test Your Knowledge

Take a quick quiz to test your understanding of this topic.

Search

Search Coming Soon

Distributed Caching

What is Distributed Caching?

Why Distributed Caching?

Popular Distributed Cache Systems

Redis Cluster

Memcached

Hazelcast

Cache Sharding Strategies

Consistent Hashing

Modulo Hashing

Range-Based Sharding

Cache Replication

Master-Replica

Multi-Master Replication

Cache Invalidation Strategies

Time-Based (TTL)

Event-Based

Write-Through

Cache-Aside with Versioning

Cache Stampede Prevention

Near Cache Pattern

Cache Warming

Monitoring Distributed Cache

.NET Distributed Cache

Best Practices

Interview Tips

Summary

Test Your Knowledge

Test Your System-design Knowledge