Sections

0/15

Related Guides

HLD Interview Framework: CIRCLE Method

High-Level Design

35m

Distributed Systems Patterns

High-Level Design

40m

Databases: Sharding, Indexing & Replication

High-Level Design

50m

Quiz

← Back to Library

High-Level Design·Intermediate

Caching: Strategy, Redis Internals & Distributed Patterns

Caching is the most tested topic in HLD interviews. Master cache write strategies, Redis data structures and their time complexities, distributed cache topology, the cache invalidation problem, and how to design multi-level caching architectures for real systems.

45 min read 15 sections 6 interview questions

RedisCDNCache InvalidationEvictionWrite StrategiesConsistent HashingStampedeMemcached

Why Caching Appears in Every HLD Interview

Every large-scale system design question has caching as a critical component. The numbers tell the story:

Database: handles 1,000–5,000 QPS per node
Redis: handles 100,000+ QPS from a single node

Caching is a 10–100× read throughput multiplier at 1–5% of the cost of more database hardware.

Interviewers don't test whether you'll add a cache — they test that you understand:

Which write strategy is safe for your consistency requirement
Which eviction policy fits your access pattern
How to handle cache failures, stampedes, and invalidation
How to size and architect a distributed cache

Cache Write Strategies — When to Use Each

Cache-Aside (Lazy Loading) — Default Choice

Read path: check cache → on miss, read DB → write result to cache → return. Write path: write to DB only → optionally invalidate cache key.

✓ Only caches what's actually requested (no wasted space); tolerates cache crashes
✗ Cache miss penalty = 3 network hops; risk of stale data between DB write and invalidation

Use for: read-heavy workloads (>80% reads) where milliseconds of staleness is acceptable. Most common pattern.

Write-Through — Consistency First

Every write goes to both cache and DB synchronously before returning success to the client.

✓ Reads after writes always see current data (strong consistency)
✗ Write latency doubles; cache fills with data that may never be read

Use for: payment systems, inventory counts, any data where reads must reflect the latest write immediately.

Write-Behind (Write-Back) — Lowest Latency

Write to cache only, return immediately. An async process flushes dirty keys to DB in the background.

✓ Lowest write latency — no DB in the hot path
✗ Risk of data loss if cache node crashes before flush; complex recovery

Use for: counters (likes, view counts), leaderboards, shopping carts where eventual persistence is fine. Never for financial transactions.

Read-Through — Simplified Application Code

The cache sits in front of the DB. On a miss, the cache itself loads from DB — the application never talks to the DB directly.

✓ Simpler application code; cache is always pre-populated after first access
✗ Cold-start: first request for any new key always misses; less control over what gets cached

Use for: content delivery using a managed cache tier (e.g. ElastiCache with read-through).

Refresh-Ahead (Speculative Prefetch)

When a cached item approaches its TTL expiry, proactively refresh it before it expires.

✓ Eliminates most cache misses — the item is refreshed before anyone notices it's stale
✗ May refresh items that won't be requested again (wasted compute)

Use for: session data, user preferences, content with predictable expiry and a high hit rate.

Cache-Aside vs Write-Through Flow

Rendering diagram...

Eviction Policies — Mechanism + Use Case

Policy	Mechanism	Redis Config	Best For	Avoid When
LRU (Least Recently Used)	Evict the item not accessed for the longest time. Approximated with clock algorithm.	maxmemory-policy allkeys-lru	General-purpose web caching, session data, social feeds	Bursty access patterns — a single scan evicts all recent items
LFU (Least Frequently Used)	Evict item with lowest access frequency. Counter decays over time in Redis.	maxmemory-policy allkeys-lfu	Media caching where popular items stay popular (skewed distribution)	New items — they start with frequency=0 and get evicted immediately
TTL (Time To Live)	Evict items after a fixed expiration time regardless of access pattern.	SET key value EX seconds	API responses, auth tokens, search results, rate limiting	When freshness doesn't matter — TTL is overhead with no benefit
FIFO (First In First Out)	Evict oldest inserted item regardless of access pattern.	Not native in Redis (use sorted sets)	Simple queue-like caches, rotating logs	Most caching scenarios — ignores access frequency entirely
Random	Evict a random item.	maxmemory-policy allkeys-random	When access pattern is truly uniform (rare)	Almost all production caches — wastes cache hits unnecessarily
noeviction	Return error when memory is full instead of evicting.	maxmemory-policy noeviction	When data loss is unacceptable (use DB as fallback), persistent Redis queues	High write-rate caches — will OOM and error immediately

Redis Data Structures — What Senior Engineers Know

Redis is not just a key-value store. It has 8 core data structures, each with specific time complexities and use cases. Choosing the wrong structure is a common interview mistake.

Strings: GET/SET/INCR in O(1). Use for: session tokens, feature flags, simple counts, cached HTML, serialized JSON objects. INCR is atomic — safe for distributed counters without locks.

Hashes: HGET/HSET O(1), HGETALL O(n fields). Use for: user profiles (HSET user:123 name Alice age 30), session objects (avoid serializing entire JSON). Better than storing serialized JSON when you need to update individual fields frequently.

Lists: LPUSH/RPUSH/LPOP/RPOP O(1). Use for: message queues, recent activity feeds (store last N tweet IDs per user), notification queues. LRANGE for pagination: O(S + N) where S = offset. Warning: LINDEX is O(n) — avoid for mid-list access.

Sets: SADD/SREMOVE/SISMEMBER all O(1), SMEMBERS O(n). Use for: unique visitors (cardinality), tag systems, friend/follower relationships. SINTERSTORE for set intersection (mutual friends). Note: for large cardinalities, use HyperLogLog (O(1) space, ~0.8% error) instead of Sets.

Sorted Sets (ZSETs): ZADD O(log n), ZRANK O(log n), ZRANGE O(log n + k). Use for: leaderboards (score = rating), priority queues, timeline feeds (score = timestamp), rate limiting windows. ZADD is the workhorse of real-time ranking systems.

HyperLogLog: PFADD O(1), PFCOUNT O(1). Use for: approximate unique counts with O(1) space (12KB regardless of cardinality). Approximately ~0.81% standard error (typical, Redis default). Perfect for daily active users, unique page visitors at scale where approximate counts suffice.

Streams: XADD O(1), XREAD O(n). Persistent append-only log with consumer groups. Use for: event streaming (alternative to Kafka for moderate throughput), activity feeds, audit logs.

Bitmaps: SETBIT/GETBIT O(1), BITCOUNT O(n/8). Use for: daily active user tracking (1 bit per user per day = 125MB for 1B users), feature flags per user, bloom filter-like structures.

Redis Data Structure Decision Guide

Use Case	Redis Structure	Key Pattern	Key Operation
Session data	Hash	session:{session_id}	HSET / HGET / HGETALL + TTL
Rate limiting	Sorted Set	ratelimit:{user_id}:{window}	ZADD timestamp, ZCOUNT in window, ZREMRANGEBYSCORE to clean
Leaderboard (Top-K)	Sorted Set	leaderboard:global	ZADD {score} {user_id}, ZREVRANGE 0 9
Recent activity feed	List	feed:{user_id}	LPUSH + LTRIM to 500, LRANGE for pagination
Unique visitor count (approx)	HyperLogLog	hll:visitors:{date}	PFADD {user_id}, PFCOUNT
Pub/Sub messaging	Pub/Sub or Streams	channel:{topic}	PUBLISH / SUBSCRIBE or XADD / XREAD
Distributed lock	String + NX	lock:{resource}	SET lock:res uuid NX EX 30 (SETNX + expire)
Geospatial index	Geo	drivers:active	GEOADD lat lng driver_id, GEORADIUS pickup 2km
Bloom filter (membership)	Bitmap / RedisBloom	bloom:{resource}	SETBIT at hash positions, check all set

Redis Patterns — Rate Limiting & Distributed Lock

pythonredis_patterns.py

import redis
import time
import uuid
from contextlib import contextmanager

r = redis.Redis(host='localhost', port=6379, decode_responses=True)

# ── Sliding window rate limiter ──────────────────────────────────────────

def is_rate_limited(user_id: str, max_requests: int = 100,
                    window_seconds: int = 60) -> bool:
    """
    Sliding window rate limiter using Sorted Set.
    Score = timestamp (float). Members = unique request IDs.
    O(log n) per request.
    """
    key = f"ratelimit:{user_id}"
    now = time.time()
    window_start = now - window_seconds

    pipe = r.pipeline()
    # Remove requests outside the window
    pipe.zremrangebyscore(key, 0, window_start)
    # Count remaining requests in window
    pipe.zcard(key)
    # Add current request (timestamp as both score and member prefix)
    pipe.zadd(key, {f"{now}:{uuid.uuid4().hex[:8]}": now})
    # Set TTL to auto-cleanup
    pipe.expire(key, window_seconds + 1)
    results = pipe.execute()

    current_count = results[1]  # count BEFORE this request
    return current_count >= max_requests

# ── Distributed lock (Redlock-lite) ─────────────────────────────────────

@contextmanager
def distributed_lock(resource: str, ttl_seconds: int = 30, retry_ms: int = 100):
    """
    Acquire a distributed lock using SET NX EX.
    
    Safety: SET NX EX is atomic. Lock expires automatically if holder crashes.
    UUID as value ensures only the lock holder can release it.
    Limitation: single-node only (not true Redlock). Use Redlock for multi-node.
    """
    lock_key = f"lock:{resource}"
    lock_value = str(uuid.uuid4())
    acquired = False

    try:
        # Retry until lock acquired (with backoff in production)
        for _ in range(10):
            # SET key value NX EX seconds — atomic, only sets if key doesn't exist
            acquired = r.set(lock_key, lock_value, nx=True, ex=ttl_seconds)
            if acquired:
                break
            time.sleep(retry_ms / 1000)

        if not acquired:
            raise TimeoutError(f"Could not acquire lock for {resource}")

        yield lock_value

    finally:
        if acquired:
            # Lua script: atomic check-and-delete (prevent releasing another holder's lock)
            release_script = """
            if redis.call('get', KEYS[1]) == ARGV[1] then
                return redis.call('del', KEYS[1])
            else
                return 0
            end
            """
            r.eval(release_script, 1, lock_key, lock_value)

⚠ WARNING

Cache Invalidation — The Hardest Problem in Computer Science

Phil Karlton famously said: 'There are only two hard things in Computer Science: cache invalidation and naming things.'

TTL-based expiry — set a TTL and let items expire. Simple, no coordination needed. Con: may serve stale data for up to TTL seconds. Appropriate when: data freshness lag is acceptable (social profile, product catalog).
Event-driven invalidation — when DB row changes, publish an invalidation event (via Kafka, CDC, or DB triggers). Cache consumers listen and delete/update the key. Con: complex distributed coordination. Risk: if the invalidation message is lost, cache is permanently stale. Appropriate when: data must be fresh within seconds (inventory levels, pricing).
Write-through invalidation — on write, update BOTH DB and cache atomically (or invalidate cache on DB write). Con: write latency doubles. Appropriate when: strong consistency is required.
Versioned cache keys — instead of 'user:123', use 'user:123:v7' where v7 is the current version. On update, increment version. Old keys naturally become orphaned and expire via TTL. Con: need to store and serve the current version number (itself a cache or DB lookup). Appropriate when: cache data is expensive to compute (ML features, aggregates).

The dirty secret: there is no perfect strategy. Every approach trades latency, consistency, and complexity. In interviews, pick one and justify the trade-off given the system's consistency requirements.

Cache Stampede (Thundering Herd) — Problem & Solutions

Rendering diagram...

Cache Stampede — 3 Solutions + Trade-offs

Mutex Lock (Distributed Lock)

On cache miss, acquire a distributed lock (Redis SETNX). Only the lock holder queries the DB and populates the cache. All other waiters retry after a short delay and get the cache hit. Pros: exactly one DB query per cache miss. Cons: adds latency for all waiters (lock contention), single point of failure if lock holder crashes (use TTL on lock). Best for: expensive DB queries, ML inference results, aggregate computations.

Probabilistic Early Expiration (PER)

When a cached item is close to expiry, each request independently and probabilistically refreshes it before it expires. Probability = exp(-β × time_remaining). The item is refreshed early by some requests, so by the time it formally expires, the new value is already in cache. Pros: no lock, no coordination, scales horizontally. Cons: may perform extra refreshes (wasted compute). Best for: moderate-traffic caches where eventual refresh is acceptable. Twitter uses this for timeline caches.

Background Refresh

A background async job refreshes cache keys before they expire. At item insertion, also schedule a refresh at (TTL - buffer) in a queue (Redis Stream or Kafka). Pros: zero extra latency for reads, no lock needed. Cons: requires a separate refresh worker, stale data possible in the buffer window. Best for: large objects that are expensive to compute (user recommendation lists, complex aggregates).

Multi-Level Caching Architecture

Production systems use multiple cache layers, each optimized for a different trade-off:

L1 — Process-level in-memory cache (Guava Cache, caffeine): ~1ms lookup, stores hundreds of hot objects per service instance. Zero network cost. Invalidation is local only (doesn't propagate to other instances). Use for: configuration values, static reference data, hot objects in a single-service context.

L2 — Distributed cache (Redis, Memcached): ~1ms network RTT, stores millions of objects across the cluster. Shared across all service instances. Consistent across instances. Use for: session data, user profiles, cached API responses, ML model outputs.

L3 — CDN edge cache (CloudFront, Cloudflare): ~5-50ms depending on PoP proximity, serves billions of objects globally. Closest to the user. Can't cache dynamic/user-specific content. Use for: static assets (CSS, JS, images), cacheable API responses (product catalogs, public content), video chunks.

L4 — Browser cache: 0ms (no network), stores hundreds of MB per user. Controlled by Cache-Control headers. Use for: static assets with long max-age, API responses that can be stale-while-revalidate.

Key design principle: each layer absorbs what the previous missed. Sizing the L2 cache correctly (targeting 95%+ hit rate) means the DB only handles 5% of read traffic — often a 20× reduction in DB load.

Multi-Level Caching Architecture — L1 to L4

Rendering diagram...

Memcached vs Redis — When to Use Each

Aspect	Redis	Memcached	Winner
Data structures	8 types: String, Hash, List, Set, ZSet, HLL, Stream, Bitmap	String only	Redis — significantly more flexible
Persistence	RDB snapshots + AOF write-ahead log	None (pure in-memory)	Redis — survives restarts, AOF for durability
Replication	Master-replica + Redis Sentinel + Redis Cluster	No native replication	Redis — HA built-in
Memory efficiency	Higher overhead per key (~64–100 bytes)	Lower overhead per key (~40 bytes)	Memcached — better for simple key-value at massive scale
Multi-threading	Single-threaded for commands (6.0+ I/O multi-thread)	Natively multi-threaded	Memcached — better raw throughput on multi-core
Cluster scaling	Redis Cluster (hash slots, automatic resharding)	Client-side consistent hashing	Redis Cluster — more automated
Pub/Sub	Native PUBLISH/SUBSCRIBE + Streams	None	Redis
Lua scripting	EVAL — atomic multi-command scripts	None	Redis — enables complex atomic operations
Use Redis when	Need: persistence, complex data structures, pub/sub, atomic ops, Lua scripting	—	—
Use Memcached when	Need: maximum throughput, simplest key-value, already deployed	—	—

Distributed Cache with Consistent Hashing

pythonconsistent_hashing_cache.py

import hashlib
import bisect

class ConsistentHashingCache:
    """
    Distributes cache keys across multiple Redis nodes using consistent hashing.
    
    Why not modulo hashing (key % N)?
    → Adding/removing one node remaps ~N/2 of all keys → cache stampede.
    
    Why consistent hashing?
    → Adding/removing one node remaps only K/N keys (K = keys, N = nodes).
    → With virtual nodes (vnodes), load is evenly distributed.
    """
    
    def __init__(self, nodes: list[str], vnodes: int = 150):
        """
        nodes: list of "host:port" strings
        vnodes: virtual nodes per physical node (higher = more even distribution)
        """
        self.ring: dict[int, str] = {}
        self.sorted_keys: list[int] = []
        
        for node in nodes:
            for i in range(vnodes):
                virtual_key = f"{node}:{i}"
                h = self._hash(virtual_key)
                self.ring[h] = node
                self.sorted_keys.append(h)
        
        self.sorted_keys.sort()
    
    def _hash(self, key: str) -> int:
        return int(hashlib.md5(key.encode()).hexdigest(), 16)
    
    def get_node(self, key: str) -> str:
        """Find the responsible node for a key — O(log N) binary search."""
        h = self._hash(key)
        idx = bisect.bisect(self.sorted_keys, h)
        if idx == len(self.sorted_keys):
            idx = 0  # wrap around the ring
        return self.ring[self.sorted_keys[idx]]
    
    def add_node(self, node: str, vnodes: int = 150):
        """Add a new node. Only ~K/N keys are remapped."""
        for i in range(vnodes):
            virtual_key = f"{node}:{i}"
            h = self._hash(virtual_key)
            self.ring[h] = node
            bisect.insort(self.sorted_keys, h)
    
    def remove_node(self, node: str, vnodes: int = 150):
        """Remove a node. Only ~K/N keys are remapped."""
        for i in range(vnodes):
            virtual_key = f"{node}:{i}"
            h = self._hash(virtual_key)
            if h in self.ring:
                del self.ring[h]
                self.sorted_keys.remove(h)

EXAMPLE

Interview Scenario: Design Caching for Twitter Timeline

Setup: 200M DAU, 700K reads/sec, each timeline is a list of 500 tweet IDs.

L1 — Redis Feed Cache: store per-user feed as a Redis List (LPUSH + LTRIM to 500 entries). 7-day TTL. Cache warming: fan-out service populates cache at tweet write time. Expected hit rate: 90%+ for active users.

L2 — Hot Tweet Cache: separate Redis Hash per tweet storing metadata (like count, retweet count, author info). Updated by counter service via HINCRBY (atomic increment). Hot tweets (viral) get dedicated Redis nodes with higher replication.

Cache miss handling: on Redis feed miss (cold user, expired TTL), Feed Service falls back to Cassandra — fetches recent tweets from followed accounts, sorts by tweet_id (time-ordered Snowflake), materializes cache. This path takes ~150ms vs ~5ms for cache hit.

Celebrity problem: celebrities (>10K followers) are NOT fan-out-on-write cached. Their tweet IDs are stored only in the hot tweet cache and fetched at read time. This prevents a celebrity tweet from causing 100M Redis writes.

Sizing: 200M active users × 500 tweet_ids × 8 bytes = 800GB per region. 3 Redis nodes × 256GB each = 768GB. Use Redis Cluster with 3 shards. Cost ~$15K/month vs $1.2B/year in Cassandra read costs.

Interview Questions

Click to reveal answers

Test your knowledge

Sign in to take the Quiz

This topic has 15 quiz questions with instant feedback and detailed explanations. Sign in to unlock quizzes.