Skip to main content
High-Level Design·Intermediate

Caching: Strategy, Redis Internals & Distributed Patterns

Caching is the most tested topic in HLD interviews. Master cache write strategies, Redis data structures and their time complexities, distributed cache topology, the cache invalidation problem, and how to design multi-level caching architectures for real systems.

45 min read 15 sections 6 interview questions
RedisCDNCache InvalidationEvictionWrite StrategiesConsistent HashingStampedeMemcached

Why Caching Appears in Every HLD Interview

Every large-scale system design question has caching as a critical component. The numbers tell the story:

  • Database: handles 1,000–5,000 QPS per node
  • Redis: handles 100,000+ QPS from a single node

Caching is a 10–100× read throughput multiplier at 1–5% of the cost of more database hardware.

Interviewers don't test whether you'll add a cache — they test that you understand:

  1. Which write strategy is safe for your consistency requirement
  2. Which eviction policy fits your access pattern
  3. How to handle cache failures, stampedes, and invalidation
  4. How to size and architect a distributed cache

Cache Write Strategies — When to Use Each

01

Cache-Aside (Lazy Loading) — Default Choice

Read path: check cache → on miss, read DB → write result to cache → return. Write path: write to DB only → optionally invalidate cache key.

  • ✓ Only caches what's actually requested (no wasted space); tolerates cache crashes
  • ✗ Cache miss penalty = 3 network hops; risk of stale data between DB write and invalidation

Use for: read-heavy workloads (>80% reads) where milliseconds of staleness is acceptable. Most common pattern.

02

Write-Through — Consistency First

Every write goes to both cache and DB synchronously before returning success to the client.

  • ✓ Reads after writes always see current data (strong consistency)
  • ✗ Write latency doubles; cache fills with data that may never be read

Use for: payment systems, inventory counts, any data where reads must reflect the latest write immediately.

03

Write-Behind (Write-Back) — Lowest Latency

Write to cache only, return immediately. An async process flushes dirty keys to DB in the background.

  • ✓ Lowest write latency — no DB in the hot path
  • ✗ Risk of data loss if cache node crashes before flush; complex recovery

Use for: counters (likes, view counts), leaderboards, shopping carts where eventual persistence is fine. Never for financial transactions.

04

Read-Through — Simplified Application Code

The cache sits in front of the DB. On a miss, the cache itself loads from DB — the application never talks to the DB directly.

  • ✓ Simpler application code; cache is always pre-populated after first access
  • ✗ Cold-start: first request for any new key always misses; less control over what gets cached

Use for: content delivery using a managed cache tier (e.g. ElastiCache with read-through).

05

Refresh-Ahead (Speculative Prefetch)

When a cached item approaches its TTL expiry, proactively refresh it before it expires.

  • ✓ Eliminates most cache misses — the item is refreshed before anyone notices it's stale
  • ✗ May refresh items that won't be requested again (wasted compute)

Use for: session data, user preferences, content with predictable expiry and a high hit rate.

Cache-Aside vs Write-Through Flow

Rendering diagram...

Eviction Policies — Mechanism + Use Case

PolicyMechanismRedis ConfigBest ForAvoid When
LRU (Least Recently Used)Evict the item not accessed for the longest time. Approximated with clock algorithm.maxmemory-policy allkeys-lruGeneral-purpose web caching, session data, social feedsBursty access patterns — a single scan evicts all recent items
LFU (Least Frequently Used)Evict item with lowest access frequency. Counter decays over time in Redis.maxmemory-policy allkeys-lfuMedia caching where popular items stay popular (skewed distribution)New items — they start with frequency=0 and get evicted immediately
TTL (Time To Live)Evict items after a fixed expiration time regardless of access pattern.SET key value EX secondsAPI responses, auth tokens, search results, rate limitingWhen freshness doesn't matter — TTL is overhead with no benefit
FIFO (First In First Out)Evict oldest inserted item regardless of access pattern.Not native in Redis (use sorted sets)Simple queue-like caches, rotating logsMost caching scenarios — ignores access frequency entirely
RandomEvict a random item.maxmemory-policy allkeys-randomWhen access pattern is truly uniform (rare)Almost all production caches — wastes cache hits unnecessarily
noevictionReturn error when memory is full instead of evicting.maxmemory-policy noevictionWhen data loss is unacceptable (use DB as fallback), persistent Redis queuesHigh write-rate caches — will OOM and error immediately

Redis Data Structures — What Senior Engineers Know

Redis is not just a key-value store. It has 8 core data structures, each with specific time complexities and use cases. Choosing the wrong structure is a common interview mistake.

Strings: GET/SET/INCR in O(1). Use for: session tokens, feature flags, simple counts, cached HTML, serialized JSON objects. INCR is atomic — safe for distributed counters without locks.

Hashes: HGET/HSET O(1), HGETALL O(n fields). Use for: user profiles (HSET user:123 name Alice age 30), session objects (avoid serializing entire JSON). Better than storing serialized JSON when you need to update individual fields frequently.

Lists: LPUSH/RPUSH/LPOP/RPOP O(1). Use for: message queues, recent activity feeds (store last N tweet IDs per user), notification queues. LRANGE for pagination: O(S + N) where S = offset. Warning: LINDEX is O(n) — avoid for mid-list access.

Sets: SADD/SREMOVE/SISMEMBER all O(1), SMEMBERS O(n). Use for: unique visitors (cardinality), tag systems, friend/follower relationships. SINTERSTORE for set intersection (mutual friends). Note: for large cardinalities, use HyperLogLog (O(1) space, ~0.8% error) instead of Sets.

Sorted Sets (ZSETs): ZADD O(log n), ZRANK O(log n), ZRANGE O(log n + k). Use for: leaderboards (score = rating), priority queues, timeline feeds (score = timestamp), rate limiting windows. ZADD is the workhorse of real-time ranking systems.

HyperLogLog: PFADD O(1), PFCOUNT O(1). Use for: approximate unique counts with O(1) space (12KB regardless of cardinality). Approximately ~0.81% standard error (typical, Redis default). Perfect for daily active users, unique page visitors at scale where approximate counts suffice.

Streams: XADD O(1), XREAD O(n). Persistent append-only log with consumer groups. Use for: event streaming (alternative to Kafka for moderate throughput), activity feeds, audit logs.

Bitmaps: SETBIT/GETBIT O(1), BITCOUNT O(n/8). Use for: daily active user tracking (1 bit per user per day = 125MB for 1B users), feature flags per user, bloom filter-like structures.

Redis Data Structure Decision Guide

Use CaseRedis StructureKey PatternKey Operation
Session dataHashsession:{session_id}HSET / HGET / HGETALL + TTL
Rate limitingSorted Setratelimit:{user_id}:{window}ZADD timestamp, ZCOUNT in window, ZREMRANGEBYSCORE to clean
Leaderboard (Top-K)Sorted Setleaderboard:globalZADD {score} {user_id}, ZREVRANGE 0 9
Recent activity feedListfeed:{user_id}LPUSH + LTRIM to 500, LRANGE for pagination
Unique visitor count (approx)HyperLogLoghll:visitors:{date}PFADD {user_id}, PFCOUNT
Pub/Sub messagingPub/Sub or Streamschannel:{topic}PUBLISH / SUBSCRIBE or XADD / XREAD
Distributed lockString + NXlock:{resource}SET lock:res uuid NX EX 30 (SETNX + expire)
Geospatial indexGeodrivers:activeGEOADD lat lng driver_id, GEORADIUS pickup 2km
Bloom filter (membership)Bitmap / RedisBloombloom:{resource}SETBIT at hash positions, check all set

Redis Patterns — Rate Limiting & Distributed Lock

pythonredis_patterns.py
import redis
import time
import uuid
from contextlib import contextmanager

r = redis.Redis(host='localhost', port=6379, decode_responses=True)

# ── Sliding window rate limiter ──────────────────────────────────────────

def is_rate_limited(user_id: str, max_requests: int = 100,
                    window_seconds: int = 60) -> bool:
    """
    Sliding window rate limiter using Sorted Set.
    Score = timestamp (float). Members = unique request IDs.
    O(log n) per request.
    """
    key = f"ratelimit:{user_id}"
    now = time.time()
    window_start = now - window_seconds

    pipe = r.pipeline()
    # Remove requests outside the window
    pipe.zremrangebyscore(key, 0, window_start)
    # Count remaining requests in window
    pipe.zcard(key)
    # Add current request (timestamp as both score and member prefix)
    pipe.zadd(key, {f"{now}:{uuid.uuid4().hex[:8]}": now})
    # Set TTL to auto-cleanup
    pipe.expire(key, window_seconds + 1)
    results = pipe.execute()

    current_count = results[1]  # count BEFORE this request
    return current_count >= max_requests

# ── Distributed lock (Redlock-lite) ─────────────────────────────────────

@contextmanager
def distributed_lock(resource: str, ttl_seconds: int = 30, retry_ms: int = 100):
    """
    Acquire a distributed lock using SET NX EX.
    
    Safety: SET NX EX is atomic. Lock expires automatically if holder crashes.
    UUID as value ensures only the lock holder can release it.
    Limitation: single-node only (not true Redlock). Use Redlock for multi-node.
    """
    lock_key = f"lock:{resource}"
    lock_value = str(uuid.uuid4())
    acquired = False

    try:
        # Retry until lock acquired (with backoff in production)
        for _ in range(10):
            # SET key value NX EX seconds — atomic, only sets if key doesn't exist
            acquired = r.set(lock_key, lock_value, nx=True, ex=ttl_seconds)
            if acquired:
                break
            time.sleep(retry_ms / 1000)

        if not acquired:
            raise TimeoutError(f"Could not acquire lock for {resource}")

        yield lock_value

    finally:
        if acquired:
            # Lua script: atomic check-and-delete (prevent releasing another holder's lock)
            release_script = """
            if redis.call('get', KEYS[1]) == ARGV[1] then
                return redis.call('del', KEYS[1])
            else
                return 0
            end
            """
            r.eval(release_script, 1, lock_key, lock_value)
⚠ WARNING

Cache Invalidation — The Hardest Problem in Computer Science

Phil Karlton famously said: 'There are only two hard things in Computer Science: cache invalidation and naming things.'

  1. TTL-based expiry — set a TTL and let items expire. Simple, no coordination needed. Con: may serve stale data for up to TTL seconds. Appropriate when: data freshness lag is acceptable (social profile, product catalog).
  2. Event-driven invalidation — when DB row changes, publish an invalidation event (via Kafka, CDC, or DB triggers). Cache consumers listen and delete/update the key. Con: complex distributed coordination. Risk: if the invalidation message is lost, cache is permanently stale. Appropriate when: data must be fresh within seconds (inventory levels, pricing).
  3. Write-through invalidation — on write, update BOTH DB and cache atomically (or invalidate cache on DB write). Con: write latency doubles. Appropriate when: strong consistency is required.
  4. Versioned cache keys — instead of 'user:123', use 'user:123:v7' where v7 is the current version. On update, increment version. Old keys naturally become orphaned and expire via TTL. Con: need to store and serve the current version number (itself a cache or DB lookup). Appropriate when: cache data is expensive to compute (ML features, aggregates).

The dirty secret: there is no perfect strategy. Every approach trades latency, consistency, and complexity. In interviews, pick one and justify the trade-off given the system's consistency requirements.

Cache Stampede (Thundering Herd) — Problem & Solutions

Rendering diagram...

Cache Stampede — 3 Solutions + Trade-offs

01

Mutex Lock (Distributed Lock)

On cache miss, acquire a distributed lock (Redis SETNX). Only the lock holder queries the DB and populates the cache. All other waiters retry after a short delay and get the cache hit. Pros: exactly one DB query per cache miss. Cons: adds latency for all waiters (lock contention), single point of failure if lock holder crashes (use TTL on lock). Best for: expensive DB queries, ML inference results, aggregate computations.

02

Probabilistic Early Expiration (PER)

When a cached item is close to expiry, each request independently and probabilistically refreshes it before it expires. Probability = exp(-β × time_remaining). The item is refreshed early by some requests, so by the time it formally expires, the new value is already in cache. Pros: no lock, no coordination, scales horizontally. Cons: may perform extra refreshes (wasted compute). Best for: moderate-traffic caches where eventual refresh is acceptable. Twitter uses this for timeline caches.

03

Background Refresh

A background async job refreshes cache keys before they expire. At item insertion, also schedule a refresh at (TTL - buffer) in a queue (Redis Stream or Kafka). Pros: zero extra latency for reads, no lock needed. Cons: requires a separate refresh worker, stale data possible in the buffer window. Best for: large objects that are expensive to compute (user recommendation lists, complex aggregates).

Multi-Level Caching Architecture

Production systems use multiple cache layers, each optimized for a different trade-off:

L1 — Process-level in-memory cache (Guava Cache, caffeine): ~1ms lookup, stores hundreds of hot objects per service instance. Zero network cost. Invalidation is local only (doesn't propagate to other instances). Use for: configuration values, static reference data, hot objects in a single-service context.

L2 — Distributed cache (Redis, Memcached): ~1ms network RTT, stores millions of objects across the cluster. Shared across all service instances. Consistent across instances. Use for: session data, user profiles, cached API responses, ML model outputs.

L3 — CDN edge cache (CloudFront, Cloudflare): ~5-50ms depending on PoP proximity, serves billions of objects globally. Closest to the user. Can't cache dynamic/user-specific content. Use for: static assets (CSS, JS, images), cacheable API responses (product catalogs, public content), video chunks.

L4 — Browser cache: 0ms (no network), stores hundreds of MB per user. Controlled by Cache-Control headers. Use for: static assets with long max-age, API responses that can be stale-while-revalidate.

Key design principle: each layer absorbs what the previous missed. Sizing the L2 cache correctly (targeting 95%+ hit rate) means the DB only handles 5% of read traffic — often a 20× reduction in DB load.

Multi-Level Caching Architecture — L1 to L4

Rendering diagram...

Memcached vs Redis — When to Use Each

AspectRedisMemcachedWinner
Data structures8 types: String, Hash, List, Set, ZSet, HLL, Stream, BitmapString onlyRedis — significantly more flexible
PersistenceRDB snapshots + AOF write-ahead logNone (pure in-memory)Redis — survives restarts, AOF for durability
ReplicationMaster-replica + Redis Sentinel + Redis ClusterNo native replicationRedis — HA built-in
Memory efficiencyHigher overhead per key (~64–100 bytes)Lower overhead per key (~40 bytes)Memcached — better for simple key-value at massive scale
Multi-threadingSingle-threaded for commands (6.0+ I/O multi-thread)Natively multi-threadedMemcached — better raw throughput on multi-core
Cluster scalingRedis Cluster (hash slots, automatic resharding)Client-side consistent hashingRedis Cluster — more automated
Pub/SubNative PUBLISH/SUBSCRIBE + StreamsNoneRedis
Lua scriptingEVAL — atomic multi-command scriptsNoneRedis — enables complex atomic operations
Use Redis whenNeed: persistence, complex data structures, pub/sub, atomic ops, Lua scripting
Use Memcached whenNeed: maximum throughput, simplest key-value, already deployed

Distributed Cache with Consistent Hashing

pythonconsistent_hashing_cache.py
import hashlib
import bisect

class ConsistentHashingCache:
    """
    Distributes cache keys across multiple Redis nodes using consistent hashing.
    
    Why not modulo hashing (key % N)?
    → Adding/removing one node remaps ~N/2 of all keys → cache stampede.
    
    Why consistent hashing?
    → Adding/removing one node remaps only K/N keys (K = keys, N = nodes).
    → With virtual nodes (vnodes), load is evenly distributed.
    """
    
    def __init__(self, nodes: list[str], vnodes: int = 150):
        """
        nodes: list of "host:port" strings
        vnodes: virtual nodes per physical node (higher = more even distribution)
        """
        self.ring: dict[int, str] = {}
        self.sorted_keys: list[int] = []
        
        for node in nodes:
            for i in range(vnodes):
                virtual_key = f"{node}:{i}"
                h = self._hash(virtual_key)
                self.ring[h] = node
                self.sorted_keys.append(h)
        
        self.sorted_keys.sort()
    
    def _hash(self, key: str) -> int:
        return int(hashlib.md5(key.encode()).hexdigest(), 16)
    
    def get_node(self, key: str) -> str:
        """Find the responsible node for a key — O(log N) binary search."""
        h = self._hash(key)
        idx = bisect.bisect(self.sorted_keys, h)
        if idx == len(self.sorted_keys):
            idx = 0  # wrap around the ring
        return self.ring[self.sorted_keys[idx]]
    
    def add_node(self, node: str, vnodes: int = 150):
        """Add a new node. Only ~K/N keys are remapped."""
        for i in range(vnodes):
            virtual_key = f"{node}:{i}"
            h = self._hash(virtual_key)
            self.ring[h] = node
            bisect.insort(self.sorted_keys, h)
    
    def remove_node(self, node: str, vnodes: int = 150):
        """Remove a node. Only ~K/N keys are remapped."""
        for i in range(vnodes):
            virtual_key = f"{node}:{i}"
            h = self._hash(virtual_key)
            if h in self.ring:
                del self.ring[h]
                self.sorted_keys.remove(h)
EXAMPLE

Interview Scenario: Design Caching for Twitter Timeline

Setup: 200M DAU, 700K reads/sec, each timeline is a list of 500 tweet IDs.

L1 — Redis Feed Cache: store per-user feed as a Redis List (LPUSH + LTRIM to 500 entries). 7-day TTL. Cache warming: fan-out service populates cache at tweet write time. Expected hit rate: 90%+ for active users.

L2 — Hot Tweet Cache: separate Redis Hash per tweet storing metadata (like count, retweet count, author info). Updated by counter service via HINCRBY (atomic increment). Hot tweets (viral) get dedicated Redis nodes with higher replication.

Cache miss handling: on Redis feed miss (cold user, expired TTL), Feed Service falls back to Cassandra — fetches recent tweets from followed accounts, sorts by tweet_id (time-ordered Snowflake), materializes cache. This path takes ~150ms vs ~5ms for cache hit.

Celebrity problem: celebrities (>10K followers) are NOT fan-out-on-write cached. Their tweet IDs are stored only in the hot tweet cache and fetched at read time. This prevents a celebrity tweet from causing 100M Redis writes.

Sizing: 200M active users × 500 tweet_ids × 8 bytes = 800GB per region. 3 Redis nodes × 256GB each = 768GB. Use Redis Cluster with 3 shards. Cost ~$15K/month vs $1.2B/year in Cassandra read costs.

Interview Questions

Click to reveal answers
Test your knowledge

Sign in to take the Quiz

This topic has 15 quiz questions with instant feedback and detailed explanations. Sign in to unlock quizzes.

Sign in to take quiz →