Preview — Pro guide
You are seeing a portion of this guide. Sign in and upgrade to unlock the full article, quizzes, and interview answers.
Window Functions for Analytics: ROW_NUMBER, RANK, LAG/LEAD, and Running Totals
Window functions are the single most-tested SQL topic in data and ML engineer interviews. This guide covers the full OVER() clause mechanics, every ranking function and when each is wrong, LAG/LEAD for time-series analysis, running aggregates, and the ROWS vs RANGE frame clause distinction that trips up even experienced engineers.
Why Window Functions Are the Interview Differentiator
Window functions compute a value for each row using a set of related rows — without collapsing the result into a single row like GROUP BY does. Every row in the input produces a row in the output, but each row gets access to aggregated information about its "window."
This distinction is the core concept: GROUP BY aggregates rows away; window functions annotate rows without removing them.
In interviews at Meta, Airbnb, Stripe, and Google, window functions appear in ~70% of hard SQL questions. The gaps-and-islands problem (session detection), cohort retention, running totals, top-N per group, and year-over-year comparisons all require window functions. A candidate who reaches for subqueries or self-joins for these patterns signals they're solving 2015 SQL problems.
What the Interviewer Is Actually Testing
The interview is not testing whether you know OVER() syntax. It's testing three things:
-
Pattern recognition: Can you identify that this problem requires a window function instead of a GROUP BY or self-join?
-
Frame clause precision: Do you know the difference between ROWS BETWEEN and RANGE BETWEEN? (RANGE with ties produces surprising results — see the frame clause section.)
-
NULL handling in time-series: LAG returns NULL for the first row in each partition. Do you handle it? Most interview failures happen here — the query looks right but returns NULL for first-occurrence events.
Narrate your reasoning: "I'll use LAG here to get the previous event timestamp per user, and I need to handle the NULL case for the first event — I'll use COALESCE or a CASE statement."