SQL, Analytics & Scenarios
The fluency layer tested across data, PM, and analyst rounds: SQL joins and window functions, product metrics frameworks, funnel analysis, and open-ended scenario frameworks for staff-level judgment questions.
guides
SQL Foundations for Data & ML Interviews: JOINs, Aggregations, and Window Functions
Master the SQL patterns that appear in every data and ML engineer interview — JOINs, GROUP BY, CTEs, and NULL semantics. This guide covers what most tutorials skip: why LEFT JOIN silently drops rows, when HAVING beats WHERE, and how NULL poisons aggregates in ways that hide bugs for months.
Data Quality Monitoring: Schema Drift, Null Rates, Freshness SLAs, and Anomaly Detection for Analytics
Bad data causes bad decisions faster than good models fix them. This guide covers Great Expectations-style contracts, dbt tests, freshness SLAs on Airflow/Dagster, null and uniqueness monitors, schema evolution in Avro/Protobuf, volume anomaly detection (STL, EWMA), and how Airbnb-class analytics engineering triages P0 pipeline breaks vs metric definition drift.
User Segmentation & Behavioral Analytics: RFM, Clustering, Personas, and Production Guardrails
Segmentation powers targeting, pricing, and product prioritization — but k-means on raw features without scaling, leakage from future data, and unstable personas kill trust. This guide covers RFM, hierarchical clustering vs k-means, behavioral sequence features, evaluation metrics (silhouette with caveats), and how LinkedIn-scale teams ship segments with drift monitoring.
Product Analytics for Interviews: Metric Design, Root Cause Analysis, and Scenario Frameworks
The complete framework for product analytics interview questions — DAU drops, metric trade-offs, experimentation critique, and business case analysis. Covers the metric hierarchy (north star / guardrails / diagnostics), the 5-step root cause investigation process, common scenario traps, and how to structure your answer in under 3 minutes.
SQL for Data & ML Interviews: JOINs, Window Functions, and Query Optimization
Everything you need to solve SQL interview problems at data scientist, ML engineer, and data analyst roles. Covers JOINs and aggregations, window functions (ROW_NUMBER, LAG/LEAD, running totals), CTEs, NULL traps, and the query optimization patterns that separate strong from weak SQL answers.
Cohort & Retention Analysis: D1/D7/D30 Curves, Churn Interpretation, and Retention SQL
Retention is the backbone metric for subscription, consumer, and marketplace products — yet most candidates confuse rolling retention with fixed cohort retention, mis-handle right-censoring, and cannot write the window-function SQL. This guide covers cohort tables, classic vs unbounded retention, N-day windows, and how Meta-style teams diagnose false churn from product changes vs definition drift.
Funnel Analysis: Conversion Optimization, Drop-off Attribution, and Funnel SQL
Funnel analysis measures how users progress through a sequence of steps — signup → activation → first purchase → retention. It is the most frequently tested analytics framework in product data science interviews at Meta, Airbnb, Stripe, and Lyft. This guide covers funnel construction, drop-off rate calculation, multi-touch attribution, and the SQL patterns for session-level and user-level funnels.
How to Approach Data & Product Scenario Questions
Scenario questions — DAU drops, model degradation, A/B test decisions, incident triage — are the dominant interview format for senior data scientists, product analysts, and ML engineers. This guide gives you the DIAGNOSE framework: a structured 5-step approach that transforms vague 'what would you do?' prompts into clear, systematic answers that impress interviewers at Meta, Google, Airbnb, and Stripe.
Scenario Walkthrough: Why Is DAU Dropping?
End-to-end DAU triage the way strong DS/PM orgs run it: lock the metric definition, validate instrumentation before narrative, decompose the composite into new vs returning and session depth, segment to localize a sharp break, and ship actions with impact sizing and explicit uncertainty. Includes worked timing, SQL-shaped diagnostics, and the Simpson / mix-shift traps that sink mid-level answers.
Scenario Walkthrough: Engagement vs Revenue — Guardrails & Horizon
The highest-signal PM/DS tradeoff: a surface, ranking, or growth lever lifts a leading engagement input while threatening RPM, ARPU, chargebacks, or long-horizon retention. Learn to express a *constrained* objective, pre-register guardrails, separate short-window novelty from LTV, and run readouts the way strong experimentation orgs do—not as a single p-value on one chart.
Scenario Walkthrough: Marketplace Supply–Demand Imbalance — Liquidity First
Interview-prep walkthrough for two-sided marketplaces: liquidity before GMV, cell-level supply and demand, search-to-fill, zero results, and dual-sided pain (riders and drivers can disagree in the same city). Covers levers, incentives, repositioning, throttles, and why network effects break naive A/B designs.
Scenario Walkthrough: Post-Launch — Was This Feature a Success?
Structured readout for 'we shipped' moments: pre-registered success criteria vs ad hoc happy charts, leading vs lagging metrics, holdout and cannibalization, seasonality, and the governance of when to call a win. Mirrors how strong DS orgs run launch reviews (Meta, Google, Amazon) so you sound like you have owned a launch, not like you cherry-picked a green dashboard tile.
SQL Query Optimization: Indexes, Query Plans, and Performance at Scale
Production SQL is not about writing queries that work — it's about queries that scale. This guide covers B-tree and hash index mechanics, reading EXPLAIN ANALYZE output, index selectivity and covering indexes, when indexes hurt write throughput, correlated subquery elimination, and partitioning strategies used in analytics warehouses at scale.
Window Functions for Analytics: ROW_NUMBER, RANK, LAG/LEAD, and Running Totals
Window functions are the single most-tested SQL topic in data and ML engineer interviews. This guide covers the full OVER() clause mechanics, every ranking function and when each is wrong, LAG/LEAD for time-series analysis, running aggregates, and the ROWS vs RANGE frame clause distinction that trips up even experienced engineers.
SQL Joins & Aggregations: INNER, LEFT, GROUP BY, and Aggregation Pitfalls
Joins and GROUP BY form the backbone of every SQL interview. This guide covers the four join types with precise NULL semantics, GROUP BY with HAVING, the COUNT(*) vs COUNT(col) distinction that fails most candidates, multi-table aggregation patterns, and the self-join technique for hierarchical and comparison queries asked at Meta, Stripe, and Airbnb.
Subqueries and CTEs: WITH Clauses, Correlated Subqueries, and Recursive Patterns
Subqueries and CTEs are the SQL constructs that separate mid-level analysts from senior engineers. This guide covers scalar subqueries, correlated subqueries and their performance implications, the WITH (CTE) clause for readable multi-step logic, and recursive CTEs for hierarchical data — all tested at Meta, Stripe, Google, and Airbnb's data rounds.
Attribution Modeling: Last-Touch, Multi-Touch, Shapley, MMM, and Incrementality
Marketing and growth interviews ask how you credit conversions to channels — last-click is biased, multi-touch rules are arbitrary, and MMM is slow. This guide covers Markov attribution, Shapley-Owen values, geo holdouts for ground-truth incrementality, and how Uber-style teams pair fast digital attribution with slower marketing mix models.
Scenario Walkthrough: The A/B Test Went Wrong — SRM, Peeking, and Interference
When the experiment is lying but the slide deck is green. Walk the failure modes that dominate production: sample ratio mismatch, peeking and early stopping, novelty and learning effects, network interference, wrong randomization unit, and thin-event metrics read too early. Teaches the remediation playbook: invalidate, debug assignment, or redesign — and how to say that without panicking the room. Grounded in standard OCE practice, chi-square SRM, and the Kohavi, Tang, and Xu body of work on trustworthy experiments.
Scenario Walkthrough: Recommendation Model CTR Dropped 15% Overnight
A structured incident walkthrough for sudden ML metric degradation. Learn how to separate feature drift, label drift, training-serving skew, serving regressions, and upstream schema failures with fast falsification checks and executive-safe communication.
Scenario Walkthrough: Payment Service Returning 500s in Production
A step-by-step incident response walkthrough for severe production outages. Covers triage order, dependency isolation, rollback decisions, cascading failure containment, and stakeholder communication under time pressure.
Scenario Walkthrough: Trust & Safety Escalation — Abuse Signals & Response
When bad actors spike, reports flood, or a model misfires, strong answers triage threat class, validate measurement, use velocity and graph signals for coordination, respect human review capacity, and govern precision versus harm — not generic retrain-the-model talk. Maps signals to tiered actions, appeals, and guard metrics the way integrity orgs actually run operations.
SQL Indexes and Query Performance: B-Tree, Composite, and Covering Indexes
Index selection is what separates a working SQL query from one that runs in production. This guide covers B-tree index mechanics, composite index column ordering, covering indexes for index-only scans, when indexes hurt writes, and how to read EXPLAIN plans — skills tested at Google, Stripe, and Meta for senior data and backend roles.