Cohort & Retention Analysis: D1/D7/D30 Curves, Churn Interpretation, and Retention SQL
Retention is the backbone metric for subscription, consumer, and marketplace products — yet most candidates confuse rolling retention with fixed cohort retention, mis-handle right-censoring, and cannot write the window-function SQL. This guide covers cohort tables, classic vs unbounded retention, N-day windows, and how Meta-style teams diagnose false churn from product changes vs definition drift.
Why Retention Interviews Fail Candidates
Retention looks simple — "what % of signups are active on day 7?" — until you operationalize which signups, what counts as active, calendar day vs 24-hour windows, and right censoring for recently acquired users.
Interviewers probe three skills: (1) metric definition that resists gaming, (2) cohort construction (fixed signup week vs rolling), (3) SQL that joins first activity to subsequent activity without double-counting sessions as users.
The hidden trap is that retention is a denominator discipline problem. If your denominator includes users who have not had enough elapsed time to possibly return on day 30, your D30 "drop" is a math artifact, not product truth. Strong candidates say this unprompted and either exclude young cohorts or mark them censored. Another common mistake is mixing user grain and event grain in the same query; a single power user can inflate session counts and mask broad churn among casual users.
What Differentiates Strong Answers
Mid-level: Computes D7 as sessions on day 7 / signups — often wrong if users can return multiple times or if timezone boundaries shift.
Senior: User-level binary retention (did user have ≥1 qualifying event on day N since first eligible timestamp), handles partial cohorts (exclude last 7 days from D7 denominator or mark censored).
Staff: Connects retention movement to product surface area changes, acquisition mix shift, and definition drift (new app version logging events differently).
Clarifying Questions Before Building a Retention Curve
Anchor event
Signup, first purchase, or activation milestone — retention means nothing without anchor.
Return definition
Any login, meaningful action, or revenue event — drives absolute retention levels.
Calendar vs rolling N-day
Day 7 midnight-to-midnight vs 168 hours post-anchor — pick one consistently.
Partial cohort policy
Exclude incomplete horizons or plot with confidence ribbons.
Classic N-Day Retention vs Bracketed Windows
Classic N-day: User is retained on day N if they performed the return event on the Nth calendar day (or Nth 24h bucket) after anchor. Common in investor decks — easy to compare across companies.
Bracketed (D0–D7): Active at least once in the first week — higher numbers, different semantic. Always label axes explicitly.
Unbounded (rolling) retention: Share of cohort active per week/month — smooths noise, good for mature product monitoring.
In interviews, the best habit is to tie each definition to a decision owner. PMs running onboarding experiments usually care about strict D1/D7 because it captures habit formation; growth teams often prefer bracketed windows to understand "any engagement" after campaign spend; finance and long-range planning rely on rolling monthly retention because it maps better to recurring revenue curves.
Never compare numbers from different definitions in the same slide without an explicit footnote. Many post-launch debates are not about product quality at all — they are about one team reporting strict D7 and another reporting D0–D7 windowed retention.
Cohort Table to Curve — Data Flow
N-Day Retention SQL — User-Level Pattern
-- Anchor = first app_open per user; return = any app_open on calendar day N
WITH anchor AS (
SELECT user_id, MIN(DATE_TRUNC('day', ts)) AS cohort_date
FROM events WHERE event = 'signup'
GROUP BY 1
),
activity AS (
SELECT user_id, DATE_TRUNC('day', ts) AS d FROM events WHERE event = 'app_open'
)
SELECT
a.cohort_date,
COUNT(DISTINCT a.user_id) AS cohort_size,
COUNT(DISTINCT CASE
WHEN EXISTS (
SELECT 1 FROM activity x
WHERE x.user_id = a.user_id AND x.d = a.cohort_date + INTERVAL '7 day'
) THEN a.user_id END) * 1.0 / COUNT(DISTINCT a.user_id) AS d7_retention
FROM anchor a
GROUP BY 1;
Retention Definitions — Pick One and Document
| Definition | Numerator | Denominator risk | Best for |
|---|---|---|---|
| Strict N-day | Active exactly day N | Partial recent cohorts | Investor comparables |
| N-day window | Active days 1 through N | Inflates vs strict | Growth marketing |
| Weekly rolling | WAU of cohort / cohort size | Mature monitoring | B2B SaaS |
| Revenue-weighted | Spend on day N | Sparse for low freq | Marketplaces |
| Recommended default | Binary user on day N + censoring rules | Clear SQL | Interview + prod |
False Churn and Definition Drift
Silent push opt-out: DAU drops — actually fewer notifications, not less love.
SDK version: New release stops firing screen_view — retention collapses while true engagement unchanged.
Acquisition mix: Paid installs replace organic — headline D7 drops without product regression.
Interview Closing
"I'd define anchor and return events with PM, pick strict N-day retention with censoring for incomplete cohorts, then write a window-function or self-join SQL that operates at user grain — and I'd sanity-check with a single-cohort hand audit before trusting the curve."
Interview Questions
Click to reveal answersSign in to take the Quiz
This topic has 15 quiz questions with instant feedback and detailed explanations. Sign in to unlock quizzes.
Sign in to take quiz →