Skip to main content
High-Level Design·Intermediate

CDN: Edge Caching, Push vs Pull, and Invalidation at Global Scale

CDNs cut global latency from hundreds of milliseconds to tens, offload 80-99% of origin bandwidth, and shield origins from bursty traffic. Master push vs pull, cache hierarchy, invalidation, and the failure modes that break CDN-backed systems.

45 min read 18 sections 7 interview questions
CDNCloudflareFastlyCloudFrontNetflix Open ConnectEdge ComputingCache InvalidationAnycastGeoDNSVarnishCache StampedeOrigin Shield

Why Origin-Only Architectures Break at Global Scale

A user in Sydney hitting a single origin in us-east-1 pays ~230ms round-trip latency before the server has even started processing. Add a TLS handshake (2 RTTs), and TTFB is already ~700ms on a cold connection. You cannot fix this with faster servers — the speed of light is the constraint.

There are three distinct problems a CDN solves, and interviewers expect you to name all three:

  1. Latency reduction via proximity — a POP 50km from the user delivers bytes in 5-15ms instead of 100-300ms. This dominates perceived performance for anything with many round-trips — HTML, APIs, video manifests.

  2. Bandwidth offload — if 95% of requests are served from the edge, your origin serves 1/20th the bytes. AWS origin egress is ~$0.09/GB; blended CDN egress is $0.02-0.04/GB. A video service serving 10 PB/month saves ~$500K/month just on egress.

  3. Origin shielding — a flash crowd for a news article can take down an origin that would otherwise be fine. A CDN absorbs the burst at the edge; request coalescing and an origin shield ensure the origin sees 1 request, not 1 million.

The mental model: a CDN is a globally-distributed read-through cache with DNS-based routing. Everything else — push vs pull, invalidation, edge compute — is a variation on that core.

IMPORTANT

What Interviewers Test on CDN

Interviewers probe three things. (1) Do you understand the push vs pull tradeoff and can you pick the right one for video vs a news site? (2) Do you know how invalidation actually works — TTL vs purge vs versioned URLs, and why versioned URLs are the production default? (3) Can you reason about cache hit rate — what breaks it (cookies, query strings, Vary headers) and what protects the origin when it drops (request coalescing, origin shield).

A candidate who says "we'll put a CDN in front" without explaining cache keys or invalidation strategy gets penalized. A candidate who says "versioned URLs via content hash in the filename so invalidation is free" signals production experience.

Clarifying Questions Before Adding a CDN

01

What's the content mix — static, dynamic, or personalized?

Static (images, JS, CSS, video chunks): CDN is a no-brainer, expect 95%+ hit rate.

Dynamic but cacheable (product pages, search results): possible with short TTLs + cache keys excluding user-specific data.

Personalized (user feed, cart): needs Edge Side Includes or edge compute — not traditional CDN caching.

02

What's the invalidation latency requirement?

Seconds (breaking news, stock prices): need explicit purge (~1-5s global) or very short TTL.

Minutes (product pricing): TTL of 60-300s works.

Never-stale is acceptable (library JS): use versioned URLs with 1-year TTL — the best pattern.

03

What's the traffic pattern — predictable or bursty?

Predictable large files (movie releases, game patches): push CDN — pre-stage content to edges before demand.

Long-tail small objects (website assets): pull CDN — cache on first request, evict when cold.

04

What are the geographic hotspots?

Global uniform: any tier-1 CDN works (Cloudflare, Akamai, CloudFront).

Regional (e.g., 80% India + SEA): pick a CDN with dense POPs there; Cloudflare has 300+ POPs, CloudFront 450+, Fastly fewer but larger.

ISP-level (e.g., streaming): consider a custom CDN with ISP peering (Netflix Open Connect, Google GGC).

05

What's the budget for misses?

Every miss costs origin egress plus origin compute. At 10K QPS with 95% hit rate, origin sees 500 QPS. If hit rate drops to 80% (bad cache key design), origin sees 2000 QPS — 4x the load. Quantify this upfront.

Push vs Pull CDN — The Core Distinction

Pull CDN (default for web): the CDN caches content lazily on first request. User in Tokyo requests /logo.png, Tokyo POP has a miss, fetches from origin, caches for TTL seconds, serves. Every subsequent Tokyo user gets a cache hit. Originated by Akamai; standard for Cloudflare, CloudFront, Fastly.

  • Use when: content is small, numerous, and access patterns are unpredictable (web assets, images, API responses).
  • Advantage: you upload once to origin; CDN propagates on demand. Zero pre-provisioning.
  • Downside: cold caches on the long tail — first user per POP pays origin latency. Gets worse with many POPs (300 POPs equals 300 cold misses per new asset).

Push CDN (default for large files): you explicitly upload content to the CDN's storage, which replicates to edge POPs before any user request. The CDN becomes the origin.

  • Use when: content is large, access patterns are predictable, and you can't afford cold misses (video chunks, game patches, software downloads, firmware updates).
  • Advantage: no origin fetch ever — edge is pre-warmed. Origin bandwidth drops to near-zero.
  • Downside: you pay storage at every edge (expensive for infrequently accessed content); you must orchestrate the upload pipeline.

Production reality: Netflix uses push (Open Connect) because a 4K movie is 20 GB and a single cold miss would cost the ISP 20 GB of transit. Instagram uses pull because there are billions of images with a long tail. Most websites use pull. A common hybrid: push for the top 1% of hot objects (predictable) and pull for the long tail.

Push vs Pull CDN — Decision Matrix

DimensionPush CDNPull CDN
Best forLarge files, predictable traffic, video, software downloadsSmall files, unpredictable traffic, web assets, API
First request latencyPre-warmed around 5-15msCold miss adds origin RTT and propagation
Storage costHigh, replicated at every POPLow, only cached when accessed
Origin bandwidthNear-zero after initial pushProportional to 1 minus hit rate
InvalidationRe-upload or explicit deleteTTL expiry or purge API
Operational modelYou orchestrate uploadsCDN fetches on demand
Real examplesNetflix Open Connect, Steam, Apple iOS updatesCloudflare, CloudFront, Akamai for web
Recommended defaultPredictable, large, hot content over 100MB and over 10K req/dayEverything else — the default

CDN Request Flow — DNS, Edge, Regional, Origin Shield

Rendering diagram...

Cache Hierarchy — Why Origin Shield Exists

Naive CDN: user → edge POP → origin. Works for one POP. With 300 POPs and a viral tweet, you get 300 simultaneous misses on origin — a thundering herd. Origin meltdown.

The production solution is a multi-tier hierarchy:

  • Edge POP (closest to user, ~10ms) — small, fast cache, often RAM + SSD. Handles the vast majority of requests. Hit rate ~80% typical.
  • Regional cache (one per continent, ~30ms) — larger disk cache, aggregates misses from dozens of edge POPs. Cumulative hit rate ~95%.
  • Origin shield (one single region, ~80ms) — the last line of defense. All regional misses funnel through it. Its job is request coalescing: if 1000 regional caches all miss on the same URL simultaneously, origin shield sends exactly one request to origin and fans out the response. Cumulative hit rate ~99%.
  • Origin — your actual server. Sees ~1% of user traffic.

Concrete numbers: at 1M requests/sec with this hierarchy, origin sees ~10K requests/sec. Without origin shield, a cache miss burst could drive origin to 100K+ QPS — instant outage. Cloudflare calls this "Argo Tiered Cache"; Fastly calls it "Shielding"; CloudFront calls it "Origin Shield" (enabled per-distribution, ~$0.0075/10K requests extra).

When to skip origin shield: if your origin is already behind its own strong cache (e.g., Redis + CDN, or an in-memory origin cache), the extra hop is pure latency. Rule of thumb: enable it when origin is a database-backed app, skip when origin is itself a CDN-grade cache.

Cache Invalidation — The Hardest Problem

"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton. The quote is old; the problem is unchanged.

You have three strategies. Pick the right one or you will ship stale content.

1. TTL-based expiry (passive): set Cache-Control: max-age=300; the edge evicts after 300s. Simple, no API call needed.

  • Problem: content can be stale for up to TTL seconds. For a news site this means 5 minutes of wrong headlines after you fix a typo.
  • Use for: content tolerant of staleness (blog posts, product descriptions).

2. Explicit purge (active): call the CDN's purge API (PURGE /article/123). The CDN broadcasts the invalidation to all POPs.

  • Propagation time: ~200ms (Fastly, via bitmap-based purge) to ~30-60s (CloudFront, historically slow). Cloudflare is near-instant for Enterprise.
  • Cost: ~$0.005 per purge on CloudFront; free on most others up to a quota.
  • Problem: broadcasting to 300 POPs is O(N) — expensive at scale; systems that purge on every update (e.g., inventory counters) melt the control plane.
  • Use for: low-frequency, high-urgency invalidations (breaking news edit, security bulletin).

3. Versioned URLs (the production default): change the URL when content changes. /app.a3f7c9.js becomes /app.d81e2b.js on new deploy. The old version can have Cache-Control: public, max-age=31536000, immutable (1 year).

  • Invalidation cost: zero. The new URL has no cached copies anywhere; the old URL is never requested again by updated HTML.
  • Works because the URL itself encodes the version. Content-addressable.
  • Use for: JS/CSS/image assets, font files — anything emitted by a build pipeline.

Combine them: HTML gets short TTL (60s) or purge-on-publish; assets referenced from HTML get versioned URLs with 1-year TTL. This is how every modern web framework (Next.js, Vite, webpack) ships. HTML changes trigger asset URL changes transitively — no purge needed.

Invalidation Strategy Comparison

StrategyLatency to GlobalCostBest ForGotcha
TTL expiry (passive)Up to TTLFreeLow-priority staleness toleranceContent is stale for up to TTL seconds
Soft purge or stale-while-revalidateNear-instant at edgeFreeFrequently-updated cached HTMLRequires stale-while-revalidate header support
Explicit purge API200ms to 60sPer-purge fee and API rate limitsBreaking news, security fixesBroadcast cost is O of POPs — don't purge on every write
Surrogate keys or cache tagsAround 200msFree in most CDNsBulk invalidation by tag such as user-123Only Fastly, Cloudflare Enterprise, Varnish support well
Versioned URLs via content hashInstant via new URLZeroStatic assets from build pipelineRequires referencing HTML to change too

Cache Keys — Why Cookies and Query Strings Destroy Hit Rate

The cache key is the tuple the CDN uses to decide "is this a hit?" By default it's (method, host, path). You can add headers, cookies, and query strings — and every addition fragments the cache.

Example disaster: you include Cookie in the cache key. Every user has a unique session cookie. Cache hit rate drops from 95% to 0%. Your origin is now serving 20x the traffic. You are paged at 3am.

Rules of thumb:

  • Strip cookies on asset paths (/static/*). The browser sends cookies; the CDN ignores them for caching. CloudFront, Cloudflare, Fastly all support this as a config flag.
  • Normalize query strings: sort and whitelist only the params that actually change the response. ?utm_source=x should not fragment the cache; ?page=2 should.
  • Use Vary headers sparingly. Vary: Accept-Encoding is fine (2 variants: gzip, br). Vary: User-Agent is a disaster (thousands of variants, one per browser version).
  • Cache by device class, not UA string: classify UA into mobile/desktop/tablet at edge, vary on that derived header. Cloudflare Workers, Lambda@Edge, Fastly VCL can all compute this.

The math: hit rate vs cache key cardinality. If your cache key has 1000 variants per URL (1 per user), working set grows 1000x. For fixed cache size, eviction churn means effective hit rate collapses. Keep cache keys narrow — include only what actually determines the response.

Dynamic Content at the Edge — Workers, Lambda@Edge, Fastly Compute

The old rule: "CDNs cache static content, origin serves dynamic." This is now outdated. Edge compute platforms run your code in a POP, milliseconds from the user.

The three contenders:

  • Cloudflare Workers — V8 isolates (not containers, not VMs). Cold start: <1ms. 50ms CPU limit on free plan, 30s on paid. JS/TS/WASM. Runs in 300+ POPs. Pricing: $5/10M requests.
  • AWS Lambda@Edge — full Lambda, runs at CloudFront edge. Cold start: 100-500ms (same runtime pool as regional Lambda). Node.js, Python. Heavier but more compatible with AWS ecosystem.
  • Fastly Compute@Edge — WASM-based. Cold start: ~50 microseconds. Rust, Go, JS, AssemblyScript. Highest raw perf, smallest ecosystem.

When to use edge compute:

  • Auth at edge — validate JWTs without an origin hop. Saves ~100ms per request.
  • A/B test routing — inject a cookie, route to variant without origin round-trip.
  • Edge Side Includes — assemble cached fragments plus personalized bits into a full page at edge.
  • Origin request rewriting — transform requests before they hit origin (e.g., adding signed URLs for S3).
  • Geographic compliance — strip PII or redirect based on user country.

When NOT to use:

  • Database-heavy workloads — your DB is still in one region; edge compute adds latency back when it calls home.
  • Heavy compute (ML inference on large models) — edge runtimes have tight CPU/memory limits.
  • Complex state — edge KV stores have eventual consistency and write latency in seconds.

Cold start reality: V8 isolates on Cloudflare have <1ms cold starts because the JS runtime is already warm — only your code loads. Lambda@Edge is slower because it boots a fresh Node.js process. Pick based on your latency budget.

Personalization Behind a CDN — Edge Side Includes and Fragment Caching

The hardest interview follow-up: "How do you CDN a page with user-specific content?" The naive answer (disable caching) is wrong and expensive. The real answers:

Pattern 1 — Split the page:

  • GET /products/123 → fully cacheable, same for all users. 1-year TTL with versioned URL.
  • GET /api/me/cart-count → private, no-cache, ~10ms origin call.
  • Browser renders the shell from cache, fetches personalized fragments via JS. Time-to-first-byte is near-zero.

Pattern 2 — Edge Side Includes (ESI):

  • Cache the HTML template with placeholder tags like esi:include src=/fragments/cart-count.
  • Edge server (Varnish, Akamai, Fastly) sees the ESI tag, fetches the fragment (itself cached per-user with short TTL), splices it in, returns assembled HTML.
  • Works without client-side JS. Legacy but still used by large e-commerce (Zalando, IKEA).

Pattern 3 — Edge compute (modern):

  • Cloudflare Worker fetches cached shell plus calls origin for user-specific bits plus assembles response.
  • More flexible than ESI (arbitrary code); higher latency than pure cache (adds worker execution time).

Pattern 4 — Signed fragments with short TTL:

  • Cache per-user fragments at edge with 10s TTL and user ID in cache key.
  • Works for low-cardinality personalization (e.g., A/B test variants) but destroys hit rate for true per-user data.

Recommendation: Pattern 1 (split plus client-side) is the default for modern web apps. ESI for legacy or no-JS requirements. Edge compute when you need server-side assembly with low latency.

Production CDNs — Who Uses What and Why

Netflix Open Connect — the canonical custom CDN. Netflix ships physical servers ("OCAs" — Open Connect Appliances) to ISPs; the ISP hosts them for free because it cuts their transit cost. Content is pushed nightly during off-peak hours. Result: 95%+ of Netflix traffic is served directly from inside the ISP's network, ~5ms from the user. This is only viable because Netflix has concentrated, predictable content (a movie gets millions of views).

Cloudflare — largest POP network (300+), Anycast routing, V8 Workers for edge compute. Best for web/API. Free tier is generous. DDoS mitigation is industry-leading. Downside: less control than self-hosted.

Fastly — Varnish-based with VCL config language. Fewer POPs (~80) but larger and better-peered. Fastest purge in the industry (~150ms global). Used by Reddit, Shopify, The New York Times. Compute@Edge is WASM-based and very fast. Downside: VCL has a learning curve; smaller POP count hurts in some regions.

AWS CloudFront — integrated with the AWS ecosystem (S3, Lambda, ACM). Large POP footprint. Good for AWS-native stacks. Historically slow purges (now under 60s). Cheaper than Cloudflare for high-volume AWS egress because of private networking.

Akamai — the original CDN (1998). Largest by POP count but expensive and legacy tooling. Still dominant in enterprise (banking, government). Specialized products (Bot Manager, Kona) are strong. Not the default for greenfield.

Typical egress pricing: $0.085/GB at AWS origin becomes $0.02-0.04/GB at CDN. At 1 PB/month, that's ~$50K saved. For video at petabyte scale, direct ISP peering (Netflix, YouTube via Google GGC) cuts cost further — effectively zero per-GB marginal cost.

Cache-Control Headers — Production-Ready Recipes

yamlcache-control-recipes.yaml
# Versioned static assets (JS, CSS, images with content hash in filename)
# Safe because URL changes when content changes
path: /static/*.{js,css,png,woff2}
headers:
  Cache-Control: "public, max-age=31536000, immutable"
  # 1 year, never revalidate; browser treats as permanently cached

# HTML pages - short TTL with stale-while-revalidate
path: /
headers:
  Cache-Control: "public, max-age=60, stale-while-revalidate=600"
  # Fresh for 60s; serve stale while refetching in background for 10 min

# API responses - conditional caching
path: /api/public/*
headers:
  Cache-Control: "public, max-age=30, s-maxage=300"
  # Browser caches 30s; CDN caches 5 min (s-maxage overrides for shared caches)

# User-specific responses - never cache at CDN
path: /api/me/*
headers:
  Cache-Control: "private, no-cache, must-revalidate"
  # private = browser only; no-cache = always revalidate with origin

# Sensitive data - never store anywhere
path: /api/auth/*
headers:
  Cache-Control: "no-store"
  Pragma: "no-cache"
  # no-store is strictest; nothing cached even in browser

Failure Modes — What Breaks and How to Defend

Cache poisoning via host header injection: an attacker sends Host: evil.com with malicious content; the CDN caches it under your domain's key. Users see the attacker's content. Defense: validate Host at edge; use an X-Forwarded-Host whitelist; never reflect Host into the response body.

Cache stampede on TTL expiry: a hot key expires at t=0. Before the first request repopulates (say 50ms), 1000 concurrent requests all miss and all hit origin. Origin overload. Defenses:

  • Request coalescing (default in Varnish, Cloudflare, Fastly): edge sees N identical in-flight requests, sends 1 to origin, fans out the response.
  • Probabilistic early refresh (PER): with probability exp(-beta * remaining_ttl), one request refreshes before expiry. Smooths the load; no hard cliff.
  • Longer TTL plus explicit purge: set a 1-hour TTL, purge on update. No expiry means no stampede.

POP outage: a POP goes down (fiber cut, BGP flap). Defense: Anycast IPs automatically shift traffic to the next-nearest POP in ~seconds. GeoDNS fallback uses 60s TTL for DNS-based CDNs. Multi-CDN for the paranoid: run Cloudflare + Fastly, DNS-failover between them (adds complexity; only do this if you need 99.99%+).

Origin meltdown from cache miss burst: a deploy invalidates the cache; 10M hot keys all need refetching. Origin can't handle the burst. Defenses:

  • Origin shield: coalesces the burst to ~10K unique requests instead of 10M.
  • Gradual rollout: don't invalidate everything at once. Use versioned URLs so new content coexists with old.
  • Origin rate limiting: set a max concurrent-request limit from CDN to origin; excess requests get 503s (acceptable for a 30s deploy) rather than taking origin down.

Cookie leak into cache: a misconfigured rule causes responses with Set-Cookie to be cached and served to other users. User A's session cookie ends up in user B's browser. Defense: a strict rule that responses with Set-Cookie are marked private and never cached at CDN. All major CDNs enforce this by default; don't override it.

TIP

Quick Mental Model for Interviews

When asked "should we add a CDN here?", answer in three beats:

  1. What's the content? Static + global → yes, trivially. Dynamic + personalized → nuanced (ESI or edge compute). Internal-only → no.
  2. What's the invalidation story? Versioned URLs for assets; TTL + purge for HTML; no-store for user data. Specify which layer gets which strategy.
  3. What's the origin protection? Origin shield + request coalescing. Quote a number: "at 1M QPS with 99% cumulative hit rate, origin sees 10K QPS — within its headroom."

A candidate who can hit those three beats with specific numbers and named systems (Cloudflare, Fastly, Netflix Open Connect) demonstrates senior-level fluency.

Evaluation — How to Measure CDN Effectiveness

Cache hit rate — the north star metric. Target: >90% for static, >75% for HTML, >50% for API. Below those thresholds, revisit cache keys.

Origin offload ratio — (bytes served at edge) / (total bytes). Target: >95% at scale. Tracks bandwidth cost savings.

P99 TTFB by geography — measure from real-user monitoring (RUM), not synthetic. If Tokyo P99 is 400ms while US is 40ms, your POP coverage in APAC is insufficient.

Purge propagation time — how fast does a purge reach all POPs? Measure by scripting: purge a URL, then hit every POP and check for 200 vs stale response. Production SLA: 99% of POPs updated within 5s.

Origin error rate — 5xx from origin. A spike indicates a cache miss burst or bad purge. Set an alert at 10x baseline.

Cost per GB delivered — total CDN spend divided by total bytes delivered. Track month-over-month. If rising, investigate hit rate and egress contract tiers.

Tools: Cloudflare Analytics, Fastly Real-Time Stats, CloudWatch for CloudFront, or third-party RUM (Datadog RUM, SpeedCurve).

Interview Questions

Click to reveal answers
Test your knowledge

Sign in to take the Quiz

This topic has 15 quiz questions with instant feedback and detailed explanations. Sign in to unlock quizzes.

Sign in to take quiz →