Sections

A 9-week structured roadmap for Backend Engineer interview preparation — DSA to LLD to HLD to PROD-ENG. Covers how each area is weighted at IC3 through IC6, why grinding 500 LeetCode problems beats understanding 50 patterns, and the operational maturity signals that separate L5 candidates from L6 at Google, Meta, Amazon, and Stripe.

15 min read 11 sections 5 interview questions

Backend EngineeringSoftware Engineering InterviewData StructuresAlgorithmsSystem DesignLow-Level DesignProduction EngineeringLeetCodeFAANG InterviewSREDistributed SystemsInterview PrepStudy Plan

Why This Path Is Ordered DSA → LLD → HLD → PROD-ENG

Backend engineering interviews have four distinct pillars, and most candidates underinvest in at least two of them — typically because they spend all their time on the pillar they're most anxious about (usually DSA) and skip the ones they've never been explicitly tested on (usually PROD-ENG and LLD).

The ordering is intentional:

DSA first because it's table stakes — you cannot skip it. Unlike other pillars, a failed DSA round is non-negotiable at most FAANG companies. A brilliant HLD answer cannot save a loop where you failed to implement BFS correctly. DSA is also the longest stage to improve from scratch, so it gets the most time.
LLD second because it requires a shift in thinking from algorithmic to object-oriented design. LLD tests whether you can model real-world systems as classes and interfaces — a fundamentally different skill than graph traversal. Mid-level loops (L4/L5) fail candidates here more often than at any other stage, because candidates with strong algorithmic backgrounds underestimate OOP design.
HLD third because high-level system design requires LLD intuition as a prerequisite. Candidates who haven't thought about class responsibilities and interface contracts struggle to reason about service boundaries and API contracts in HLD. The reasoning patterns are the same — just at a larger scale.
PROD-ENG last because it requires production experience (or the ability to simulate it) and is primarily tested at L5+. Covering PROD-ENG without the HLD foundation would be like learning oncall procedures without understanding what the system does. It's also the most differentiating pillar at senior levels — almost every L6 failure traces to weak operational maturity, not weak DSA.

Total time: 9 weeks. Candidates with strong DSA backgrounds can compress Stage 1 to 1–2 weeks and invest the difference in HLD and PROD-ENG.

IMPORTANT

How Each Pillar Is Weighted by Level — IC3 Through IC6

Level calibration is the most underappreciated dimension of interview prep. The same four pillars are tested across all levels, but the weight and depth required shift dramatically:

L3/E3 (New Grad / Junior): ~70% DSA, ~30% LLD. HLD is light or absent. PROD-ENG is not tested. Expectation: clean implementation of medium-difficulty algorithms. OOP basics in LLD. No expectation of system-scale thinking.

L4/E4 (Software Engineer II): ~50% DSA, ~30% LLD, ~20% HLD. PROD-ENG still largely absent. Expectation: medium-hard DSA proficiency, LLD for 2–3 SOLID patterns applied correctly, HLD at the "whiteboard block diagram" level with basic component justification.

L5/E5 (Senior Software Engineer): ~30% DSA, ~30% LLD, ~30% HLD, ~10% PROD-ENG. DSA shifts from pattern recall to problem decomposition — you're expected to recognize and adapt, not just execute templates. HLD now requires tradeoff justification ("I'd use Kafka over RabbitMQ because..."). PROD-ENG appears as questions about oncall practices, rollout strategy, and monitoring.

L6/E6 (Staff Software Engineer): ~15% DSA, ~20% LLD, ~40% HLD, ~25% PROD-ENG. DSA is de-emphasized — it's still tested but failure on a medium DSA problem rarely kills a L6 loop. HLD shifts to ambiguous open-ended systems. PROD-ENG is a primary signal: SLOs, error budgets, postmortem quality, capacity planning, and cross-team system ownership are all in scope. Behavioral depth (influencing without authority, driving org-level decisions) is heavily weighted.

The 4-Stage Backend Engineer Prep Path

Stage 1 — Data Structures & Algorithms (Weeks 1–3)

Arrays/strings, hashmaps/sets, trees, graphs, two-pointer/sliding window, binary search, dynamic programming, and heap/priority queue. The non-obvious insight: grinding 500 LeetCode problems produces diminishing returns after problem 150. What matters is understanding 50 patterns deeply — sliding window template, binary search on answer (not just sorted arrays), BFS/DFS state machine, and DP state transition formulation. Most L5+ loops ask medium-to-hard LeetCode, and hard problems are almost always a medium pattern applied to an unfamiliar domain. Pattern recognition beats problem memorization. Key drills: implement BFS/DFS from scratch without notes, write sliding window for at least 5 different problem types, derive DP recurrence relations out loud.

Stage 2 — Low-Level Design (Weeks 4–5)

Class design, SOLID principles (Single Responsibility, Open-Closed, Liskov Substitution, Interface Segregation, Dependency Inversion), and design patterns: Observer (event systems), Strategy (interchangeable algorithms), Factory (object creation), Decorator (adding behavior), and Command (undo/redo). Concurrency: locks, thread-safe collections (ConcurrentHashMap), producer-consumer with bounded queues, and deadlock avoidance. The critical gap: ML/data candidates who apply to backend roles almost always underinvest in LLD. They can implement quicksort but cannot design a class hierarchy for a parking lot system. LLD questions ask you to model a real system as code — practice modeling a ride-share trip, a task scheduler, a cache, and a shopping cart from scratch.

Stage 3 — High-Level Design (Weeks 6–8)

Scalability primitives: horizontal vs vertical scaling, stateless vs stateful services. Data storage: SQL vs NoSQL selection (SQL for transactions/joins/ACID; NoSQL for horizontal scale/flexible schema/high write throughput). Caching: Redis for session/hot data, CDN for static assets, cache-aside vs write-through vs write-behind. Message queues: Kafka for durable event streaming, SQS for reliable async work distribution. CAP theorem: consistency vs availability tradeoff (choose based on the business invariant — a bank cannot tolerate inconsistency; a social feed can). Rate limiting: token bucket (smooth bursts), leaky bucket (strict rate), sliding window log (precise but memory-intensive). Load balancing: L4 vs L7, sticky sessions, consistent hashing. The key HLD insight: interviewers are not testing technology trivia. They are testing whether you can justify tradeoffs. 'I'd use Kafka' is not an answer. 'I'd use Kafka because the consumer needs independent replay of events and we can tolerate up to 5s of delivery lag' is an answer.

Stage 4 — Production Engineering (Week 9)

SLOs and error budgets: define the target reliability (e.g., 99.9% availability = 43.8 min/month allowed downtime), track it, and use the budget to gate feature velocity. Monitoring: the four golden signals (latency, traffic, errors, saturation) plus USE (utilization, saturation, errors) for resources and RED (rate, errors, duration) for services. Deployment strategies: canary deploys (route 1–5% of traffic, watch signals for 10–30 minutes before promoting), blue-green (parallel environments, instant switch), feature flags (decouple deploy from release). Postmortems: blameless, timeline-focused, root cause with contributing factors, action items with owners and dates. Capacity planning: know your traffic patterns, identify the saturation point for each resource, plan headroom for 2× peak. These are the signals FAANG uses to distinguish L5 from L6 — operational fluency is how senior engineers demonstrate scope beyond feature delivery.

9-Week Time Allocation and Priority by Stage

Stage	Topic	Duration	Focus Areas	Common Failure Mode
Stage 1	DSA	Weeks 1–3	Pattern mastery: sliding window, BFS/DFS, binary search on answer, DP state transitions	Grinding volume without pattern understanding; cannot adapt known problems to new variants
Stage 2	LLD	Weeks 4–5	SOLID principles, Observer/Strategy/Factory patterns, concurrency primitives	Underinvesting as a non-OOP background; failing LLD at mid-level loops despite strong DSA
Stage 3	HLD	Weeks 6–8	CAP theorem, caching (Redis), queues (Kafka), SQL vs NoSQL selection, rate limiting, load balancing	Listing technologies without justifying tradeoffs; 'I'd use Kafka' with no explanation
Stage 4	PROD-ENG	Week 9	SLOs/error budgets, four golden signals, canary deploys, postmortems, capacity planning	Skipping this stage entirely — it's the primary L5→L6 differentiator

DSA: Pattern Mastery Over Problem Volume

The single most common mistake in backend interview prep is grinding LeetCode problems without a pattern framework. After approximately 150 problems, each additional problem has diminishing marginal return if you're not categorizing the pattern and understanding why it applies.

The five patterns that cover ~70% of FAANG medium-hard DSA problems:

Sliding window: Two-pointer technique for subarray/substring problems. Template: expand right pointer, maintain invariant, contract left when invariant violated. Works for: longest substring without repeating characters, minimum window substring, maximum sum subarray of size k. The key: identify what invariant the window must maintain.
Binary search on answer: Don't just binary search sorted arrays — binary search on the answer space when the problem asks "find minimum X such that condition holds." Template: lo = min_possible, hi = max_possible, while lo < hi: mid = (lo+hi)//2, if condition(mid): hi = mid else lo = mid+1. Works for: Koko eating bananas, ship packages in D days, find peak element.
BFS/DFS state machine: For graph/tree problems, define state explicitly — not just the node, but the entire (node, visited_set, depth, accumulated_cost) tuple depending on the problem. Common bug: not including enough state in the visited set, causing infinite loops. BFS for shortest path (unweighted). DFS + memoization for counting paths.
DP state transition: The only way to approach DP is to define: (a) what does dp[i] represent in plain English, (b) what is the base case, (c) what is the recurrence. If you cannot answer these three questions for a DP problem, you cannot solve it. Common mistake: trying to "see the DP" without explicitly defining state.
Heap / top-k pattern: Priority queue for any "find k largest/smallest/most frequent" problem. Two-heap trick for running median (max-heap for left half, min-heap for right half, sizes maintained at ±1).

LLD: The Mid-Level Trap That Ends Loops Early

LLD is where the most mid-level backend loops fail — not because it's the hardest pillar, but because candidates underestimate it. Engineers from ML, data, or research backgrounds assume LLD is "just OOP" and skip it. Then they fail to design a thread-safe LRU cache or a task scheduler in 45 minutes.

What LLD interviews look like: You're given a system description ("design a parking lot management system," "design a rate limiter class," "design a pub-sub notification system") and asked to design and implement the class hierarchy. The interviewer expects: class definitions with clear responsibilities, interfaces and abstractions, concrete implementations, and 1–2 design patterns applied appropriately.

SOLID applied concretely:

Single Responsibility: PaymentProcessor should not also handle email notifications. Separate them.
Open-Closed: Adding a new payment method should not require modifying PaymentProcessor. Use an interface PaymentGateway implemented by StripeGateway, PaypalGateway, etc.
Dependency Inversion: CheckoutService should depend on PaymentGateway (interface), not StripeGateway (concrete class). Inject the concrete implementation.

Concurrency — the most common LLD fail point: If the question involves shared state (a cache, a queue, a counter), interviewers expect you to reason about thread safety. Key tools: synchronized blocks, ReentrantLock for more control, ConcurrentHashMap for hash map thread safety, BlockingQueue for producer-consumer patterns. The critical moment: when a candidate adds synchronized to a method and the interviewer asks "what's the performance implication?" — knowing that synchronized creates contention and when to use finer-grained locking is the L5 signal.

Backend Engineer vs SRE vs Platform Engineer — Interview Scope Differences

Dimension	Backend Engineer	Site Reliability Engineer (SRE)	Platform / Infrastructure Engineer
DSA weight	High — 3 rounds at L4+	Moderate — 1–2 rounds	Moderate — 1–2 rounds, infrastructure-flavored
LLD weight	High — full design round	Low — not primary signal	Moderate — infrastructure component design
HLD weight	High — scalable service design	High — reliability-focused (SLOs, failure modes)	Very high — distributed infrastructure design (K8s, service mesh)
PROD-ENG weight	Moderate at L5, high at L6	Very high — this is the primary signal	High — deployment, monitoring, capacity
Coding style	Business logic, APIs, data modeling	Automation scripts, runbook tools, config management	Infrastructure-as-code, control planes, operators
System design framing	Scalability and feature correctness	Reliability, failure modes, SLOs, blast radius	Platform abstractions, multi-tenant isolation, efficiency
Biggest mistake	Skipping PROD-ENG at L5+ loops	Treating it like a backend SWE interview with extra ops questions	Underestimating the depth of distributed systems expected

HLD: Justifying Tradeoffs Is the Entire Skill

High-level design interviews are not knowledge tests. They are judgment tests dressed as knowledge tests. Interviewers ask "how would you design Twitter's feed system" not because there's a correct answer, but because they want to see how you navigate tradeoffs under ambiguity.

The tradeoff framing for every HLD component:

SQL vs NoSQL selection: Use SQL when: you need ACID transactions (financial systems, inventory), your data has a well-defined relational schema, you need complex JOIN queries, you're at <100M rows. Use NoSQL when: you need horizontal write scaling (>10K writes/sec that can't be sharded easily on SQL), your schema is highly variable, you need sub-millisecond key-value lookups (Redis), you're storing large binary blobs (S3/GCS rather than DB). The wrong answer: "I'd use NoSQL because it's more scalable." NoSQL is horizontally scalable for specific access patterns — it trades JOIN capability and transactional guarantees to get there.

Caching strategy: Cache-aside: application reads from cache; on miss, reads from DB and writes to cache. Cache is always consistent after misses but has cold-start latency. Best for read-heavy, infrequently updated data. Write-through: writes go to cache and DB synchronously. Cache is always fresh but write latency increases. Best for write-heavy data that's immediately read back. Write-behind (write-back): writes go to cache only, async flushed to DB. Lowest write latency, highest risk of data loss on cache failure. Only for non-critical data (e.g., view counts).

Kafka vs SQS: Kafka when: consumers need independent replay of events, you have multiple consumer groups with different processing rates, you need event sourcing or audit log. Kafka retains messages for configurable duration. SQS when: you need simple at-least-once delivery with minimal operational overhead, consumers don't need replay, you're on AWS and want managed scaling. SQS deletes messages after acknowledgment.

The non-obvious insight about HLD justification: The interviewer does not care if you choose Kafka or SQS. They care whether you can explain why the choice matters in the context of the requirements. A candidate who says "I'd use Kafka because we need independent consumer replay and a 7-day audit log for compliance" passes. A candidate who says "I'd use Kafka because it's better for high-throughput" without connecting to the stated requirements does not.

⚠ WARNING

The Gaps That Cause Backend Candidates to Fail at FAANG

Gap 1 — Pattern recognition vs problem recognition: Most candidates who grind 200+ LeetCode problems can solve problems they've seen before, but freeze on slight variations. The fix: after solving any problem, write down the pattern name and the three signals in the problem statement that indicate this pattern. Build a pattern taxonomy, not a problem library.

Gap 2 — LLD underinvestment by non-OOP backgrounds: Data scientists, ML engineers, and recent research-to-SWE transitions consistently fail LLD rounds because they've spent years in Python scripting (functions + classes for ML), not production OOP (interfaces, inheritance hierarchies, design patterns). Budget the full two weeks even if OOP "feels familiar."

Gap 3 — HLD without tradeoff justification: Listing technologies — "I'd have a load balancer, Redis, Kafka, and a database" — scores low. Every component must be justified: what problem does it solve, why this technology over the obvious alternative, and what failure mode does it introduce? Candidates who cannot answer "why Kafka over a simple database queue?" for their own architecture should not include Kafka.

Gap 4 — Skipping PROD-ENG entirely: Most backend candidates stop at HLD and assume PROD-ENG is SRE territory. At L5, PROD-ENG questions appear in behavioral rounds: "How do you approach releasing a high-risk change?" At L6, PROD-ENG is a dedicated round. Skipping it is the primary reason strong L5 candidates fail L6 loops. Budget one week minimum.

Gap 5 — Coding in the wrong language for the company: Some companies (especially in Go/Java shops) care whether you can reason about language-specific concurrency primitives. Know whether your target company's interview is language-agnostic or expects specific idioms. Meta and Google are generally language-agnostic; some Amazon teams expect Java.

PROD-ENG: The Primary L5→L6 Differentiator

Production engineering knowledge is the most consistently underestimated pillar in backend interview prep. Candidates assume it's "soft skills" or "SRE territory." It is neither. PROD-ENG in backend interviews is about demonstrating that you have internalized the consequences of shipping software at scale — not just building it.

SLOs and error budgets — the vocabulary every L5+ candidate must speak:

An SLO (Service Level Objective) defines the target reliability for a service, expressed as a ratio over a time window. Example: 99.9% of HTTP requests succeed over a rolling 30-day window. The error budget is the complement: 0.1% of requests can fail = 43.8 minutes of downtime equivalent per month.

The interview question is never "define SLO." It's: "How do you decide whether to ship a risky change given your current error budget?" The answer: if you've consumed 80% of your error budget in the first two weeks of the month, you do not ship risky changes. The error budget governance process — not the definition — is what interviewers are testing.

The four golden signals (from Google SRE Book): Latency, Traffic, Errors, Saturation. Every monitoring alert should be anchored to at least one of these. Candidates who answer "what metrics would you monitor for your service?" with "CPU and memory" are not thinking at the right level. CPU and memory are saturation signals, but they're lagging indicators. Latency p99 and error rate are leading indicators of user impact.

Canary deploy mechanics: A canary routes 1–5% of production traffic to the new version. You then watch the four golden signals — specifically: does p99 latency increase? Does error rate diverge from baseline? Canary duration should be long enough to capture all traffic patterns (weekday vs weekend, peak vs off-peak). Promoting too early because "the first 5 minutes look fine" is a common mistake that leads to weekend incidents. Typical canary duration: 30 minutes to 24 hours depending on risk level.

Postmortem quality as an L6 signal: The difference between an L5 and L6 postmortem is scope. L5: "the service crashed because of a bad config change." L6: "the service crashed because of a bad config change, which propagated because our config validation CI step only checks syntax, not semantic correctness; and it cascaded because our circuit breaker thresholds were not calibrated to handle the resulting 10× latency increase. Three action items with owners and a 2-week deadline."

TIP

Interview Readiness Signals: How to Know You're Prepared at Each Level

You are ready for an L4 backend loop when you can:

Implement BFS and DFS from scratch without notes, including cycle detection
Write a thread-safe LRU cache in 30 minutes with correct locking semantics
Draw a basic three-tier architecture (client → API → DB) and explain how to add a cache layer
State the difference between SQL and NoSQL with one concrete tradeoff each

You are ready for an L5 backend loop when you can:

Recognize the sliding window, binary search on answer, and BFS state machine patterns on sight and adapt them to unfamiliar problem statements
Design a class hierarchy for a real-world system applying at least two design patterns with justification
Design a rate limiter, URL shortener, or feed system with explicit justification for every technology choice
Articulate what an SLO is, what an error budget is, and how you'd use one to gate a risky deploy

You are ready for an L6 backend loop when you can:

Navigate an ambiguous HLD question by asking the right clarifying questions before designing (scale, consistency requirements, latency SLO, read/write ratio)
Write a full postmortem with timeline, root cause, contributing factors, and action items with owners
Explain a capacity planning process: traffic projection, resource saturation model, headroom calculation
Describe how you've influenced a cross-team technical decision and what the measurable outcome was

Interview Questions

Click to reveal answers

Test your knowledge

Sign in to take the Quiz

This topic has 15 quiz questions with instant feedback and detailed explanations. Sign in to unlock quizzes.