Skip to main content

Preview — Pro guide

You are seeing a portion of this guide. Sign in and upgrade to unlock the full article, quizzes, and interview answers.

SQL & Databases·Intermediate

Window Functions for Analytics: ROW_NUMBER, RANK, LAG/LEAD, and Running Totals

Window functions are the single most-tested SQL topic in data and ML engineer interviews. This guide covers the full OVER() clause mechanics, every ranking function and when each is wrong, LAG/LEAD for time-series analysis, running aggregates, and the ROWS vs RANGE frame clause distinction that trips up even experienced engineers.

40 min read 3 sections 1 interview questions
Window FunctionsROW_NUMBERRANKDENSE_RANKLAGLEADRunning TotalNTILEPARTITION BYOVER ClauseFrame ClauseAnalytics SQL

Why Window Functions Are the Interview Differentiator

Window functions compute a value for each row using a set of related rows — without collapsing the result into a single row like GROUP BY does. Every row in the input produces a row in the output, but each row gets access to aggregated information about its "window."

This distinction is the core concept: GROUP BY aggregates rows away; window functions annotate rows without removing them.

In interviews at Meta, Airbnb, Stripe, and Google, window functions appear in ~70% of hard SQL questions. The gaps-and-islands problem (session detection), cohort retention, running totals, top-N per group, and year-over-year comparisons all require window functions. A candidate who reaches for subqueries or self-joins for these patterns signals they're solving 2015 SQL problems.

IMPORTANT

What the Interviewer Is Actually Testing

The interview is not testing whether you know OVER() syntax. It's testing three things:

  1. Pattern recognition: Can you identify that this problem requires a window function instead of a GROUP BY or self-join?

  2. Frame clause precision: Do you know the difference between ROWS BETWEEN and RANGE BETWEEN? (RANGE with ties produces surprising results — see the frame clause section.)

  3. NULL handling in time-series: LAG returns NULL for the first row in each partition. Do you handle it? Most interview failures happen here — the query looks right but returns NULL for first-occurrence events.

Narrate your reasoning: "I'll use LAG here to get the previous event timestamp per user, and I need to handle the NULL case for the first event — I'll use COALESCE or a CASE statement."

IMPORTANT

Premium content locked

This guide is premium content. Upgrade to Pro to unlock the full guide, quizzes, and interview Q&A.