Skip to main content
Product Analytics·Intermediate

Cohort & Retention Analysis: D1/D7/D30 Curves, Churn Interpretation, and Retention SQL

Retention is the backbone metric for subscription, consumer, and marketplace products — yet most candidates confuse rolling retention with fixed cohort retention, mis-handle right-censoring, and cannot write the window-function SQL. This guide covers cohort tables, classic vs unbounded retention, N-day windows, and how Meta-style teams diagnose false churn from product changes vs definition drift.

40 min read 9 sections 6 interview questions
Cohort AnalysisRetention CurveD1 RetentionD7 RetentionD30 RetentionChurnWindow FunctionsProduct AnalyticsSQL AnalyticsRight CensoringStickinessWAU MAU

Why Retention Interviews Fail Candidates

Retention looks simple — "what % of signups are active on day 7?" — until you operationalize which signups, what counts as active, calendar day vs 24-hour windows, and right censoring for recently acquired users.

Interviewers probe three skills: (1) metric definition that resists gaming, (2) cohort construction (fixed signup week vs rolling), (3) SQL that joins first activity to subsequent activity without double-counting sessions as users.

The hidden trap is that retention is a denominator discipline problem. If your denominator includes users who have not had enough elapsed time to possibly return on day 30, your D30 "drop" is a math artifact, not product truth. Strong candidates say this unprompted and either exclude young cohorts or mark them censored. Another common mistake is mixing user grain and event grain in the same query; a single power user can inflate session counts and mask broad churn among casual users.

IMPORTANT

What Differentiates Strong Answers

Mid-level: Computes D7 as sessions on day 7 / signups — often wrong if users can return multiple times or if timezone boundaries shift.

Senior: User-level binary retention (did user have ≥1 qualifying event on day N since first eligible timestamp), handles partial cohorts (exclude last 7 days from D7 denominator or mark censored).

Staff: Connects retention movement to product surface area changes, acquisition mix shift, and definition drift (new app version logging events differently).

Clarifying Questions Before Building a Retention Curve

01

Anchor event

Signup, first purchase, or activation milestone — retention means nothing without anchor.

02

Return definition

Any login, meaningful action, or revenue event — drives absolute retention levels.

03

Calendar vs rolling N-day

Day 7 midnight-to-midnight vs 168 hours post-anchor — pick one consistently.

04

Partial cohort policy

Exclude incomplete horizons or plot with confidence ribbons.

Classic N-Day Retention vs Bracketed Windows

Classic N-day: User is retained on day N if they performed the return event on the Nth calendar day (or Nth 24h bucket) after anchor. Common in investor decks — easy to compare across companies.

Bracketed (D0–D7): Active at least once in the first week — higher numbers, different semantic. Always label axes explicitly.

Unbounded (rolling) retention: Share of cohort active per week/month — smooths noise, good for mature product monitoring.

In interviews, the best habit is to tie each definition to a decision owner. PMs running onboarding experiments usually care about strict D1/D7 because it captures habit formation; growth teams often prefer bracketed windows to understand "any engagement" after campaign spend; finance and long-range planning rely on rolling monthly retention because it maps better to recurring revenue curves.

Never compare numbers from different definitions in the same slide without an explicit footnote. Many post-launch debates are not about product quality at all — they are about one team reporting strict D7 and another reporting D0–D7 windowed retention.

Cohort Table to Curve — Data Flow

Rendering diagram...

N-Day Retention SQL — User-Level Pattern

sqlretention_n_day.sql
-- Anchor = first app_open per user; return = any app_open on calendar day N
WITH anchor AS (
  SELECT user_id, MIN(DATE_TRUNC('day', ts)) AS cohort_date
  FROM events WHERE event = 'signup'
  GROUP BY 1
),
activity AS (
  SELECT user_id, DATE_TRUNC('day', ts) AS d FROM events WHERE event = 'app_open'
)
SELECT
  a.cohort_date,
  COUNT(DISTINCT a.user_id) AS cohort_size,
  COUNT(DISTINCT CASE
    WHEN EXISTS (
      SELECT 1 FROM activity x
      WHERE x.user_id = a.user_id AND x.d = a.cohort_date + INTERVAL '7 day'
    ) THEN a.user_id END) * 1.0 / COUNT(DISTINCT a.user_id) AS d7_retention
FROM anchor a
GROUP BY 1;

Retention Definitions — Pick One and Document

DefinitionNumeratorDenominator riskBest for
Strict N-dayActive exactly day NPartial recent cohortsInvestor comparables
N-day windowActive days 1 through NInflates vs strictGrowth marketing
Weekly rollingWAU of cohort / cohort sizeMature monitoringB2B SaaS
Revenue-weightedSpend on day NSparse for low freqMarketplaces
Recommended defaultBinary user on day N + censoring rulesClear SQLInterview + prod
⚠ WARNING

False Churn and Definition Drift

Silent push opt-out: DAU drops — actually fewer notifications, not less love.

SDK version: New release stops firing screen_view — retention collapses while true engagement unchanged.

Acquisition mix: Paid installs replace organic — headline D7 drops without product regression.

TIP

Interview Closing

"I'd define anchor and return events with PM, pick strict N-day retention with censoring for incomplete cohorts, then write a window-function or self-join SQL that operates at user grain — and I'd sanity-check with a single-cohort hand audit before trusting the curve."

Interview Questions

Click to reveal answers
Test your knowledge

Sign in to take the Quiz

This topic has 15 quiz questions with instant feedback and detailed explanations. Sign in to unlock quizzes.

Sign in to take quiz →