Sections

0/10

Related Guides

What Good Code Actually Looks Like: Engineering Craft Beyond the Linter

Production Engineering

50m

Writing the Blameless Postmortem: RCAs That Actually Drive Change

Production Engineering

45m

Rewrite vs. Refactor: How to Make the Call Without Destroying the Business

Production Engineering

45m

Design Patterns for LLD Interviews

Low-Level Design

40m

Quiz

← Back to Library

Production Engineering·Intermediate

Code Review Excellence: The Craft That Most Engineers Never Learn

A concrete methodology for giving and receiving code review that improves code quality, accelerates team growth, and does not create adversarial dynamics. Covers the review hierarchy (correctness before style), how to write feedback that gets implemented, the approve-vs-block decision framework, reviewing for future maintainers, and the anti-patterns — LGTM culture, nitpick spirals, ego-driven rejections — that poison engineering team velocity.

40 min read 10 sections 7 interview questions

Code ReviewEngineering CraftTechnical LeadershipFeedbackPull RequestsCode QualityTeam VelocitySoftware EngineeringCollaborationEngineering CultureReview ProcessTechnical MentorshipCode StandardsEngineering Judgment

Why Code Review Is the Highest-Leverage Engineering Skill Nobody Teaches

Every engineer learns to write code. Almost no one is taught to review it.

This matters because at most companies above 30 engineers, code review is where the majority of software quality decisions are made. More bugs are caught in review than in testing. More architectural debt is introduced through approved PRs than through any other mechanism. The quality of code review directly determines the quality of the system — and the growth rate of the engineers on the team.

Despite this, most code review is learned by osmosis: junior engineers mimic what senior engineers do, senior engineers mimic what they saw earlier in their careers, and the anti-patterns propagate through generations. Teams develop cargo-cult review practices — adding comments because commenting is what reviewers do, not because the comments improve the code.

The two failure modes that are most expensive:

LGTM culture: Reviews that rubber-stamp changes to maintain social harmony or avoid conflict. The cost is not felt immediately — it accumulates in the form of bugs that were not caught, abstractions that were not challenged, tests that were not written. By the time the cost becomes visible, it is impossible to trace it to the reviews that created it.

Nitpick spiral: Reviews that spend 80% of attention on formatting, naming, and style while missing correctness issues, missing edge cases, or missing the fact that the design is wrong. This is the most common anti-pattern among engineers who are technically strong but not yet thinking about review as a leverage activity.

The mental model that fixes both: code review is a teaching opportunity disguised as a quality gate. The goal is not to block bad code — it is to make the author a better engineer and the codebase a healthier system.

TIP

What Interviewers Are Testing

L4/Mid signal: Can describe what they look for in a review (correctness, tests, naming). Knows that style comments are less important than logic. Has opinions about PR size and frequency.

L5/Senior signal: Has a review hierarchy (correctness first, design second, tests third, style last). Knows how to give feedback that is actionable and non-adversarial. Can articulate the difference between a blocking comment and a non-blocking suggestion. Understands reviewing for the future maintainer, not just the current change.

Staff signal: Frames code review as a system, not an activity. Measures review latency, LGTM rate, and comment resolution patterns. Identifies team-level anti-patterns (LGTM culture, nitpick spirals) and proposes process changes. Uses review as deliberate mentorship. Knows when to take a design discussion out of the PR and into a design doc instead.

The Review Hierarchy — In This Order, No Exceptions

Layer 1: Correctness (always first)

Does the code do what it claims to do? This means: reading the PR description and understanding the intent, then verifying that the implementation matches. Correctness issues are always blocking. Specific checks: (1) Happy path — does the code work for the stated use case? (2) Edge cases — what happens at boundaries (empty input, nil/null, zero, max values, concurrent access)? (3) Error handling — what happens when a dependency fails? Are errors surfaced to the caller or swallowed silently? (4) Failure modes — can this code fail in a way that causes data loss, incorrect user-visible state, or a security vulnerability? Do not proceed to later layers until you are confident the code is correct.

Layer 2: Design (before you look at tests)

Is this the right abstraction? Does this code belong here? Design issues should be raised early because they may require significant rework — catching a design problem after the code is polished is expensive for both reviewer and author. Specific checks: (1) Does this function/class have a single clear responsibility, or is it doing multiple things that should be separated? (2) Are the boundaries between this code and the rest of the system correct? (3) Is there a simpler way to accomplish the same goal? (4) Does this change expose implementation details that will make the abstraction harder to evolve?

Layer 3: Tests (blocking if missing for non-trivial logic)

Are the tests meaningful? The two questions that reveal whether tests are real: (a) If you introduced a bug in this code path right now, would one of these tests catch it? (b) What does this test actually verify? If the answer to (a) is no, or if the test is asserting implementation details rather than behavior, the tests are not covering the intended behavior. Tests are blocking for: any code path with business logic, error handling, edge cases, or behavior that the PR description explicitly says should be tested. Tests are non-blocking suggestions for: utility code with obvious behavior, trivial transformations, or code that is covered by integration tests.

Layer 4: Maintainability and Naming (mostly non-blocking)

Is this code easy to understand six months from now by an engineer who has never seen it? Checks: (1) Do names accurately describe what they contain/do? (2) Are there complex logic blocks that would benefit from a clarifying comment explaining why, not what? (3) Is there duplication that should be extracted? (4) Are magic numbers or strings named? Most maintainability issues are non-blocking suggestions — they improve the code but do not prevent it from shipping. The exception: naming that is actively misleading (a function called validateUser that also creates a session) should be blocking.

Layer 5: Style (never blocking if a linter exists)

Formatting, indentation, line length, import ordering. If the project has a linter or formatter, style issues should be caught automatically and should never appear in a code review. If no linter exists, proposing one is the right action item — not leaving style comments on every PR. Style comments that are not caught by a linter should be suggestions, clearly marked as non-blocking.

Review Decision Framework — Approve, Approve with Comments, or Block

Rendering diagram...

Writing Feedback That Gets Implemented — Not Argued About

The quality of your review comments determines whether authors fix issues or defend them. The same technical insight can be written two ways: one that creates collaboration, one that creates defensiveness.

The four-part comment structure that works:

(1) Observation: What you see. Not "this is wrong" — "this function modifies the database in a loop."

(2) Problem: Why it matters. "At 10,000 records, this will make 10,000 individual database writes, each incurring network round-trip latency."

(3) Suggestion: A concrete alternative. "Consider batching writes using INSERT INTO ... VALUES (...), (...), (...) — most ORMs support batch insert natively."

(4) Label (optional but powerful): Signal whether this is blocking or a suggestion. [blocking], [suggestion], [nit], [question] prefixes remove ambiguity about whether the author needs to address the comment before merging.

Examples:

❌ "This is slow." (Observation only, no problem, no suggestion, no label — creates defensiveness)

❌ "You should use a batch insert here." (Suggestion without observation or problem — author doesn't understand why)

✅ "[blocking] This loop makes N database writes sequentially. At production scale (~50K records/day), this will take ~25 seconds per job and risk connection pool exhaustion. Use Model.insert_all(records) instead — it batches into a single query."

On asking questions vs. making statements: When you are uncertain whether something is a problem, frame it as a question: "Is there a reason this is not using the connection pool? I'd expect this to create a new connection per request." This invites explanation before requiring change. Sometimes the author knows something you do not.

On complimenting: If the author did something well — a clean abstraction, a well-named function, a particularly thoughtful error message — say so. Brief, specific compliments build trust that makes critical comments land better. "Nice use of the null object pattern here — this makes the nil check explicit."

Reviewing for the Future Maintainer — Not for Today's Author

The most important reader of the code you are reviewing is not the author. It is the engineer who will debug this code at 2am six months from now, having never seen it before.

This reframe changes what you look for:

Names that only make sense with context. The author knows what tmp, data, or result refers to because they just wrote the code. The future maintainer does not. If a variable requires reading the surrounding 20 lines to understand, it needs a better name.

Comments that explain the what, not the why. Most engineers comment to explain what the code does. The future maintainer can read the code — they need to understand why it does it this way. "Sort by creation date" is useless. "Sort by creation date because the UI expects chronological order and the DB returns results in insertion order, which diverges after backfills" is valuable.

Magic numbers and strings. A function that returns early if count > 100 leaves the future maintainer wondering: what is special about 100? Is this a product requirement? A performance limit? A database constraint? Constants should be named: MAX_BATCH_SIZE = 100 with a comment explaining the source.

Implicit preconditions. If a function will crash or produce incorrect results when called with certain inputs, and the function signature does not make this clear, add documentation. "Assumes user_id is already validated" is not obvious from def process_payment(user_id, amount).

The test for whether the code passes the future maintainer check: Ask — "if I had never seen this codebase before and I needed to modify this function, what would I need to know that is not obvious from reading it?" Every answer to that question is a comment, a better name, or a extracted constant waiting to be added.

⚠ WARNING

Anti-Patterns That Damage Teams — And How to Recognize Them in Yourself

LGTM culture. Approving changes to be nice, to avoid conflict, or because "it's their code." Every LGTM on a correctness issue is a bug deferred to production. The test: if you would not feel comfortable being the on-call engineer for this code at 2am, do not approve it.

Nitpick spiral. Leaving 15 comments on formatting, naming, and style while missing the fact that the error handling is wrong. The test: count your blocking comments vs. non-blocking suggestions. If you have zero blocking issues on a non-trivial PR, you either reviewed a perfect PR or you did not read it carefully enough. If 90% of your comments are style, you are optimizing the wrong layer.

Ego-driven blocking. Blocking a PR because you would have implemented it differently, not because the implementation is wrong. "I wouldn't have done it this way" is not a reason to request changes unless your way is demonstrably better (fewer bugs, better performance, more maintainable) and you can explain exactly why.

The serial reviewer. Leaving 10 comments, waiting for the author to address them, then leaving 10 more. Good reviews happen in one pass, not in a serial negotiation. If you know you will have follow-up concerns, raise them upfront or schedule a synchronous discussion.

Reviewing without running the code. For non-trivial changes, pull the branch and run it. Edge cases that are obvious when you run the code are often invisible when reading it. This is especially true for UI changes, database migrations, and changes to concurrency logic.

Comment Labels — Making Intent Explicit

Label	Meaning	Author Action Required	Example
[blocking]	Must be addressed before merge; correctness, design, or security issue	Yes — implement the fix or provide a compelling counter-argument	[blocking] This function swallows the error and returns nil; the caller has no way to know the operation failed
[suggestion]	Would improve the code; author's call whether to address it now or in a follow-up	Discretionary — address now or create a follow-up ticket	[suggestion] This could be simplified using Array.flat_map instead of map + flatten
[nit]	Minor style or naming improvement; explicitly non-blocking	No — author can address or ignore	[nit] `result` could be named `validated_user` here for clarity
[question]	Reviewer is uncertain whether there's an issue; asks for clarification	Yes — explain the reasoning; reviewer decides if it's still a concern after explanation	[question] Is there a reason this doesn't use the existing UserValidator class?
[praise]	Something done well; explicit positive signal	No — just acknowledgment	[praise] Clean separation of the parsing and validation logic here — easy to test each independently
[future]	Something that should be addressed but not in this PR	No — but creates awareness of tech debt	[future] When we add multi-tenancy, this query will need a tenant_id filter — worth flagging for the migration doc

PR Size, Turnaround Time, and the Reviewer's Obligation

Code review is a two-sided obligation. Authors have a responsibility to make PRs reviewable; reviewers have a responsibility to review promptly.

PR size: The empirical data on code review effectiveness (from studies at Microsoft Research and SmartBear) shows a strong inverse relationship between PR size and review quality. Reviewers can effectively review ~400 lines of diff; above 400 lines, defect detection rate drops significantly. The target is: one PR, one conceptual change, ~200-400 lines of diff (excluding generated code and tests).

Large PRs are often a symptom of not decomposing the work before starting. The correct response to a 1,500-line PR is not to review it anyway — it is to send it back with a request to split it and a suggestion for how to do so.

Turnaround time: Code review latency is one of the highest-leverage variables in engineering velocity. At Google, the internal target is a first response within one business day. PRs that wait 3+ days for a first review kill momentum, cause authors to context-switch to other work, and create merge conflicts. If you cannot review a PR within one business day, acknowledge it and say when you can.

The reviewer's obligation for complex PRs: Read the PR description before looking at the diff. If there is no description, ask for one. A reviewer who reads the diff without understanding the intent will leave comments that miss the point. The description should answer: what does this change do, why is it needed, and what should the reviewer focus on?

Level Differentiation: Code Review Behavior by Engineering Level

Level	Review Focus	Comment Style	Approve/Block Calibration	What They Miss
L3 / Junior	Style and formatting; obvious bugs	Observations without explanations; no label for blocking vs. suggestion	Either LGTM everything or block on minor style issues	Design problems, edge cases, reviewing for future maintainability, the teaching opportunity
L4 / Mid	Correctness and tests; starting to see design issues	More specific; sometimes explains the problem; inconsistent labeling	Better calibration but still misses systemic design issues	Reviewing as mentorship; the anti-patterns in their own review behavior; PR size norms
L5 / Senior	Full hierarchy: correctness, design, tests, maintainability, style in order; explicitly non-blocking for style	Four-part structure: observation + problem + suggestion + label; asks questions before blocking	Accurate block/approve decisions; uses 'approve with comments' appropriately	Team-level patterns; measuring review effectiveness; using review to shape team code norms
Staff	Reviews the design before the code exists (design doc reviews); uses code review to enforce team standards and mentor deliberately	Comments are teaching moments; explicitly connects code patterns to system-level consequences	Blocks on architecture issues; approves imperfect code that ships the right thing; knows when to take discussion outside the PR	This is the target — systematic, educational, and measured

Interview Questions

Click to reveal answers

Test your knowledge

Sign in to take the Quiz

This topic has 15 quiz questions with instant feedback and detailed explanations. Sign in to unlock quizzes.