Caching: Strategy, Redis Internals & Distributed Patterns
Caching is the most tested topic in HLD interviews. Master cache write strategies, Redis data structures and their time complexities, distributed cache topology, the cache invalidation problem, and how to design multi-level caching architectures for real systems.
Why Caching Appears in Every HLD Interview
Every large-scale system design question has caching as a critical component. The numbers tell the story:
- Database: handles 1,000–5,000 QPS per node
- Redis: handles 100,000+ QPS from a single node
Caching is a 10–100× read throughput multiplier at 1–5% of the cost of more database hardware.
Interviewers don't test whether you'll add a cache — they test that you understand:
- Which write strategy is safe for your consistency requirement
- Which eviction policy fits your access pattern
- How to handle cache failures, stampedes, and invalidation
- How to size and architect a distributed cache
Cache Write Strategies — When to Use Each
Cache-Aside (Lazy Loading) — Default Choice
Read path: check cache → on miss, read DB → write result to cache → return. Write path: write to DB only → optionally invalidate cache key.
- ✓ Only caches what's actually requested (no wasted space); tolerates cache crashes
- ✗ Cache miss penalty = 3 network hops; risk of stale data between DB write and invalidation
Use for: read-heavy workloads (>80% reads) where milliseconds of staleness is acceptable. Most common pattern.
Write-Through — Consistency First
Every write goes to both cache and DB synchronously before returning success to the client.
- ✓ Reads after writes always see current data (strong consistency)
- ✗ Write latency doubles; cache fills with data that may never be read
Use for: payment systems, inventory counts, any data where reads must reflect the latest write immediately.
Write-Behind (Write-Back) — Lowest Latency
Write to cache only, return immediately. An async process flushes dirty keys to DB in the background.
- ✓ Lowest write latency — no DB in the hot path
- ✗ Risk of data loss if cache node crashes before flush; complex recovery
Use for: counters (likes, view counts), leaderboards, shopping carts where eventual persistence is fine. Never for financial transactions.
Read-Through — Simplified Application Code
The cache sits in front of the DB. On a miss, the cache itself loads from DB — the application never talks to the DB directly.
- ✓ Simpler application code; cache is always pre-populated after first access
- ✗ Cold-start: first request for any new key always misses; less control over what gets cached
Use for: content delivery using a managed cache tier (e.g. ElastiCache with read-through).
Refresh-Ahead (Speculative Prefetch)
When a cached item approaches its TTL expiry, proactively refresh it before it expires.
- ✓ Eliminates most cache misses — the item is refreshed before anyone notices it's stale
- ✗ May refresh items that won't be requested again (wasted compute)
Use for: session data, user preferences, content with predictable expiry and a high hit rate.
Cache-Aside vs Write-Through Flow
Eviction Policies — Mechanism + Use Case
| Policy | Mechanism | Redis Config | Best For | Avoid When |
|---|---|---|---|---|
| LRU (Least Recently Used) | Evict the item not accessed for the longest time. Approximated with clock algorithm. | maxmemory-policy allkeys-lru | General-purpose web caching, session data, social feeds | Bursty access patterns — a single scan evicts all recent items |
| LFU (Least Frequently Used) | Evict item with lowest access frequency. Counter decays over time in Redis. | maxmemory-policy allkeys-lfu | Media caching where popular items stay popular (skewed distribution) | New items — they start with frequency=0 and get evicted immediately |
| TTL (Time To Live) | Evict items after a fixed expiration time regardless of access pattern. | SET key value EX seconds | API responses, auth tokens, search results, rate limiting | When freshness doesn't matter — TTL is overhead with no benefit |
| FIFO (First In First Out) | Evict oldest inserted item regardless of access pattern. | Not native in Redis (use sorted sets) | Simple queue-like caches, rotating logs | Most caching scenarios — ignores access frequency entirely |
| Random | Evict a random item. | maxmemory-policy allkeys-random | When access pattern is truly uniform (rare) | Almost all production caches — wastes cache hits unnecessarily |
| noeviction | Return error when memory is full instead of evicting. | maxmemory-policy noeviction | When data loss is unacceptable (use DB as fallback), persistent Redis queues | High write-rate caches — will OOM and error immediately |
Redis Data Structures — What Senior Engineers Know
Redis is not just a key-value store. It has 8 core data structures, each with specific time complexities and use cases. Choosing the wrong structure is a common interview mistake.
Strings: GET/SET/INCR in O(1). Use for: session tokens, feature flags, simple counts, cached HTML, serialized JSON objects. INCR is atomic — safe for distributed counters without locks.
Hashes: HGET/HSET O(1), HGETALL O(n fields). Use for: user profiles (HSET user:123 name Alice age 30), session objects (avoid serializing entire JSON). Better than storing serialized JSON when you need to update individual fields frequently.
Lists: LPUSH/RPUSH/LPOP/RPOP O(1). Use for: message queues, recent activity feeds (store last N tweet IDs per user), notification queues. LRANGE for pagination: O(S + N) where S = offset. Warning: LINDEX is O(n) — avoid for mid-list access.
Sets: SADD/SREMOVE/SISMEMBER all O(1), SMEMBERS O(n). Use for: unique visitors (cardinality), tag systems, friend/follower relationships. SINTERSTORE for set intersection (mutual friends). Note: for large cardinalities, use HyperLogLog (O(1) space, ~0.8% error) instead of Sets.
Sorted Sets (ZSETs): ZADD O(log n), ZRANK O(log n), ZRANGE O(log n + k). Use for: leaderboards (score = rating), priority queues, timeline feeds (score = timestamp), rate limiting windows. ZADD is the workhorse of real-time ranking systems.
HyperLogLog: PFADD O(1), PFCOUNT O(1). Use for: approximate unique counts with O(1) space (12KB regardless of cardinality). Approximately ~0.81% standard error (typical, Redis default). Perfect for daily active users, unique page visitors at scale where approximate counts suffice.
Streams: XADD O(1), XREAD O(n). Persistent append-only log with consumer groups. Use for: event streaming (alternative to Kafka for moderate throughput), activity feeds, audit logs.
Bitmaps: SETBIT/GETBIT O(1), BITCOUNT O(n/8). Use for: daily active user tracking (1 bit per user per day = 125MB for 1B users), feature flags per user, bloom filter-like structures.
Redis Data Structure Decision Guide
| Use Case | Redis Structure | Key Pattern | Key Operation |
|---|---|---|---|
| Session data | Hash | session:{session_id} | HSET / HGET / HGETALL + TTL |
| Rate limiting | Sorted Set | ratelimit:{user_id}:{window} | ZADD timestamp, ZCOUNT in window, ZREMRANGEBYSCORE to clean |
| Leaderboard (Top-K) | Sorted Set | leaderboard:global | ZADD {score} {user_id}, ZREVRANGE 0 9 |
| Recent activity feed | List | feed:{user_id} | LPUSH + LTRIM to 500, LRANGE for pagination |
| Unique visitor count (approx) | HyperLogLog | hll:visitors:{date} | PFADD {user_id}, PFCOUNT |
| Pub/Sub messaging | Pub/Sub or Streams | channel:{topic} | PUBLISH / SUBSCRIBE or XADD / XREAD |
| Distributed lock | String + NX | lock:{resource} | SET lock:res uuid NX EX 30 (SETNX + expire) |
| Geospatial index | Geo | drivers:active | GEOADD lat lng driver_id, GEORADIUS pickup 2km |
| Bloom filter (membership) | Bitmap / RedisBloom | bloom:{resource} | SETBIT at hash positions, check all set |
Redis Patterns — Rate Limiting & Distributed Lock
import redis
import time
import uuid
from contextlib import contextmanager
r = redis.Redis(host='localhost', port=6379, decode_responses=True)
# ── Sliding window rate limiter ──────────────────────────────────────────
def is_rate_limited(user_id: str, max_requests: int = 100,
window_seconds: int = 60) -> bool:
"""
Sliding window rate limiter using Sorted Set.
Score = timestamp (float). Members = unique request IDs.
O(log n) per request.
"""
key = f"ratelimit:{user_id}"
now = time.time()
window_start = now - window_seconds
pipe = r.pipeline()
# Remove requests outside the window
pipe.zremrangebyscore(key, 0, window_start)
# Count remaining requests in window
pipe.zcard(key)
# Add current request (timestamp as both score and member prefix)
pipe.zadd(key, {f"{now}:{uuid.uuid4().hex[:8]}": now})
# Set TTL to auto-cleanup
pipe.expire(key, window_seconds + 1)
results = pipe.execute()
current_count = results[1] # count BEFORE this request
return current_count >= max_requests
# ── Distributed lock (Redlock-lite) ─────────────────────────────────────
@contextmanager
def distributed_lock(resource: str, ttl_seconds: int = 30, retry_ms: int = 100):
"""
Acquire a distributed lock using SET NX EX.
Safety: SET NX EX is atomic. Lock expires automatically if holder crashes.
UUID as value ensures only the lock holder can release it.
Limitation: single-node only (not true Redlock). Use Redlock for multi-node.
"""
lock_key = f"lock:{resource}"
lock_value = str(uuid.uuid4())
acquired = False
try:
# Retry until lock acquired (with backoff in production)
for _ in range(10):
# SET key value NX EX seconds — atomic, only sets if key doesn't exist
acquired = r.set(lock_key, lock_value, nx=True, ex=ttl_seconds)
if acquired:
break
time.sleep(retry_ms / 1000)
if not acquired:
raise TimeoutError(f"Could not acquire lock for {resource}")
yield lock_value
finally:
if acquired:
# Lua script: atomic check-and-delete (prevent releasing another holder's lock)
release_script = """
if redis.call('get', KEYS[1]) == ARGV[1] then
return redis.call('del', KEYS[1])
else
return 0
end
"""
r.eval(release_script, 1, lock_key, lock_value)
Cache Invalidation — The Hardest Problem in Computer Science
Phil Karlton famously said: 'There are only two hard things in Computer Science: cache invalidation and naming things.'
- TTL-based expiry — set a TTL and let items expire. Simple, no coordination needed. Con: may serve stale data for up to TTL seconds. Appropriate when: data freshness lag is acceptable (social profile, product catalog).
- Event-driven invalidation — when DB row changes, publish an invalidation event (via Kafka, CDC, or DB triggers). Cache consumers listen and delete/update the key. Con: complex distributed coordination. Risk: if the invalidation message is lost, cache is permanently stale. Appropriate when: data must be fresh within seconds (inventory levels, pricing).
- Write-through invalidation — on write, update BOTH DB and cache atomically (or invalidate cache on DB write). Con: write latency doubles. Appropriate when: strong consistency is required.
- Versioned cache keys — instead of 'user:123', use 'user:123:v7' where v7 is the current version. On update, increment version. Old keys naturally become orphaned and expire via TTL. Con: need to store and serve the current version number (itself a cache or DB lookup). Appropriate when: cache data is expensive to compute (ML features, aggregates).
The dirty secret: there is no perfect strategy. Every approach trades latency, consistency, and complexity. In interviews, pick one and justify the trade-off given the system's consistency requirements.
Cache Stampede (Thundering Herd) — Problem & Solutions
Cache Stampede — 3 Solutions + Trade-offs
Mutex Lock (Distributed Lock)
On cache miss, acquire a distributed lock (Redis SETNX). Only the lock holder queries the DB and populates the cache. All other waiters retry after a short delay and get the cache hit. Pros: exactly one DB query per cache miss. Cons: adds latency for all waiters (lock contention), single point of failure if lock holder crashes (use TTL on lock). Best for: expensive DB queries, ML inference results, aggregate computations.
Probabilistic Early Expiration (PER)
When a cached item is close to expiry, each request independently and probabilistically refreshes it before it expires. Probability = exp(-β × time_remaining). The item is refreshed early by some requests, so by the time it formally expires, the new value is already in cache. Pros: no lock, no coordination, scales horizontally. Cons: may perform extra refreshes (wasted compute). Best for: moderate-traffic caches where eventual refresh is acceptable. Twitter uses this for timeline caches.
Background Refresh
A background async job refreshes cache keys before they expire. At item insertion, also schedule a refresh at (TTL - buffer) in a queue (Redis Stream or Kafka). Pros: zero extra latency for reads, no lock needed. Cons: requires a separate refresh worker, stale data possible in the buffer window. Best for: large objects that are expensive to compute (user recommendation lists, complex aggregates).
Multi-Level Caching Architecture
Production systems use multiple cache layers, each optimized for a different trade-off:
L1 — Process-level in-memory cache (Guava Cache, caffeine): ~1ms lookup, stores hundreds of hot objects per service instance. Zero network cost. Invalidation is local only (doesn't propagate to other instances). Use for: configuration values, static reference data, hot objects in a single-service context.
L2 — Distributed cache (Redis, Memcached): ~1ms network RTT, stores millions of objects across the cluster. Shared across all service instances. Consistent across instances. Use for: session data, user profiles, cached API responses, ML model outputs.
L3 — CDN edge cache (CloudFront, Cloudflare): ~5-50ms depending on PoP proximity, serves billions of objects globally. Closest to the user. Can't cache dynamic/user-specific content. Use for: static assets (CSS, JS, images), cacheable API responses (product catalogs, public content), video chunks.
L4 — Browser cache: 0ms (no network), stores hundreds of MB per user. Controlled by Cache-Control headers. Use for: static assets with long max-age, API responses that can be stale-while-revalidate.
Key design principle: each layer absorbs what the previous missed. Sizing the L2 cache correctly (targeting 95%+ hit rate) means the DB only handles 5% of read traffic — often a 20× reduction in DB load.
Multi-Level Caching Architecture — L1 to L4
Memcached vs Redis — When to Use Each
| Aspect | Redis | Memcached | Winner |
|---|---|---|---|
| Data structures | 8 types: String, Hash, List, Set, ZSet, HLL, Stream, Bitmap | String only | Redis — significantly more flexible |
| Persistence | RDB snapshots + AOF write-ahead log | None (pure in-memory) | Redis — survives restarts, AOF for durability |
| Replication | Master-replica + Redis Sentinel + Redis Cluster | No native replication | Redis — HA built-in |
| Memory efficiency | Higher overhead per key (~64–100 bytes) | Lower overhead per key (~40 bytes) | Memcached — better for simple key-value at massive scale |
| Multi-threading | Single-threaded for commands (6.0+ I/O multi-thread) | Natively multi-threaded | Memcached — better raw throughput on multi-core |
| Cluster scaling | Redis Cluster (hash slots, automatic resharding) | Client-side consistent hashing | Redis Cluster — more automated |
| Pub/Sub | Native PUBLISH/SUBSCRIBE + Streams | None | Redis |
| Lua scripting | EVAL — atomic multi-command scripts | None | Redis — enables complex atomic operations |
| Use Redis when | Need: persistence, complex data structures, pub/sub, atomic ops, Lua scripting | — | — |
| Use Memcached when | Need: maximum throughput, simplest key-value, already deployed | — | — |
Distributed Cache with Consistent Hashing
import hashlib
import bisect
class ConsistentHashingCache:
"""
Distributes cache keys across multiple Redis nodes using consistent hashing.
Why not modulo hashing (key % N)?
→ Adding/removing one node remaps ~N/2 of all keys → cache stampede.
Why consistent hashing?
→ Adding/removing one node remaps only K/N keys (K = keys, N = nodes).
→ With virtual nodes (vnodes), load is evenly distributed.
"""
def __init__(self, nodes: list[str], vnodes: int = 150):
"""
nodes: list of "host:port" strings
vnodes: virtual nodes per physical node (higher = more even distribution)
"""
self.ring: dict[int, str] = {}
self.sorted_keys: list[int] = []
for node in nodes:
for i in range(vnodes):
virtual_key = f"{node}:{i}"
h = self._hash(virtual_key)
self.ring[h] = node
self.sorted_keys.append(h)
self.sorted_keys.sort()
def _hash(self, key: str) -> int:
return int(hashlib.md5(key.encode()).hexdigest(), 16)
def get_node(self, key: str) -> str:
"""Find the responsible node for a key — O(log N) binary search."""
h = self._hash(key)
idx = bisect.bisect(self.sorted_keys, h)
if idx == len(self.sorted_keys):
idx = 0 # wrap around the ring
return self.ring[self.sorted_keys[idx]]
def add_node(self, node: str, vnodes: int = 150):
"""Add a new node. Only ~K/N keys are remapped."""
for i in range(vnodes):
virtual_key = f"{node}:{i}"
h = self._hash(virtual_key)
self.ring[h] = node
bisect.insort(self.sorted_keys, h)
def remove_node(self, node: str, vnodes: int = 150):
"""Remove a node. Only ~K/N keys are remapped."""
for i in range(vnodes):
virtual_key = f"{node}:{i}"
h = self._hash(virtual_key)
if h in self.ring:
del self.ring[h]
self.sorted_keys.remove(h)
Interview Scenario: Design Caching for Twitter Timeline
Setup: 200M DAU, 700K reads/sec, each timeline is a list of 500 tweet IDs.
L1 — Redis Feed Cache: store per-user feed as a Redis List (LPUSH + LTRIM to 500 entries). 7-day TTL. Cache warming: fan-out service populates cache at tweet write time. Expected hit rate: 90%+ for active users.
L2 — Hot Tweet Cache: separate Redis Hash per tweet storing metadata (like count, retweet count, author info). Updated by counter service via HINCRBY (atomic increment). Hot tweets (viral) get dedicated Redis nodes with higher replication.
Cache miss handling: on Redis feed miss (cold user, expired TTL), Feed Service falls back to Cassandra — fetches recent tweets from followed accounts, sorts by tweet_id (time-ordered Snowflake), materializes cache. This path takes ~150ms vs ~5ms for cache hit.
Celebrity problem: celebrities (>10K followers) are NOT fan-out-on-write cached. Their tweet IDs are stored only in the hot tweet cache and fetched at read time. This prevents a celebrity tweet from causing 100M Redis writes.
Sizing: 200M active users × 500 tweet_ids × 8 bytes = 800GB per region. 3 Redis nodes × 256GB each = 768GB. Use Redis Cluster with 3 shards. Cost ~$15K/month vs $1.2B/year in Cassandra read costs.
Interview Questions
Click to reveal answersSign in to take the Quiz
This topic has 15 quiz questions with instant feedback and detailed explanations. Sign in to unlock quizzes.
Sign in to take quiz →