Skip to main content
Learn · SQL, Analytics & Scenarios

SQL, Analytics & Scenarios

The fluency layer tested across data, PM, and analyst rounds: SQL joins and window functions, product metrics frameworks, funnel analysis, and open-ended scenario frameworks for staff-level judgment questions.

22
guides
SQL & Databases35 min

SQL Foundations for Data & ML Interviews: JOINs, Aggregations, and Window Functions

Master the SQL patterns that appear in every data and ML engineer interview — JOINs, GROUP BY, CTEs, and NULL semantics. This guide covers what most tutorials skip: why LEFT JOIN silently drops rows, when HAVING beats WHERE, and how NULL poisons aggregates in ways that hide bugs for months.

SQLJOINsGROUP BYAggregations+8
Beginner
7 questions
Product Analytics35 min

Data Quality Monitoring: Schema Drift, Null Rates, Freshness SLAs, and Anomaly Detection for Analytics

Bad data causes bad decisions faster than good models fix them. This guide covers Great Expectations-style contracts, dbt tests, freshness SLAs on Airflow/Dagster, null and uniqueness monitors, schema evolution in Avro/Protobuf, volume anomaly detection (STL, EWMA), and how Airbnb-class analytics engineering triages P0 pipeline breaks vs metric definition drift.

Data QualitySchema DriftGreat Expectationsdbt Tests+8
Intermediate
6 questions
Product Analytics36 min

User Segmentation & Behavioral Analytics: RFM, Clustering, Personas, and Production Guardrails

Segmentation powers targeting, pricing, and product prioritization — but k-means on raw features without scaling, leakage from future data, and unstable personas kill trust. This guide covers RFM, hierarchical clustering vs k-means, behavioral sequence features, evaluation metrics (silhouette with caveats), and how LinkedIn-scale teams ship segments with drift monitoring.

User SegmentationRFM ModelK-MeansHierarchical Clustering+8
Intermediate
1 questions
Product Analytics40 min

Product Analytics for Interviews: Metric Design, Root Cause Analysis, and Scenario Frameworks

The complete framework for product analytics interview questions — DAU drops, metric trade-offs, experimentation critique, and business case analysis. Covers the metric hierarchy (north star / guardrails / diagnostics), the 5-step root cause investigation process, common scenario traps, and how to structure your answer in under 3 minutes.

Product AnalyticsMetric DesignRoot Cause AnalysisDAU+7
Intermediate
1 questions
Product Analytics45 min

SQL for Data & ML Interviews: JOINs, Window Functions, and Query Optimization

Everything you need to solve SQL interview problems at data scientist, ML engineer, and data analyst roles. Covers JOINs and aggregations, window functions (ROW_NUMBER, LAG/LEAD, running totals), CTEs, NULL traps, and the query optimization patterns that separate strong from weak SQL answers.

SQLWindow FunctionsJOINsCTEs+8
Intermediate
1 questions
Product Analytics40 min

Cohort & Retention Analysis: D1/D7/D30 Curves, Churn Interpretation, and Retention SQL

Retention is the backbone metric for subscription, consumer, and marketplace products — yet most candidates confuse rolling retention with fixed cohort retention, mis-handle right-censoring, and cannot write the window-function SQL. This guide covers cohort tables, classic vs unbounded retention, N-day windows, and how Meta-style teams diagnose false churn from product changes vs definition drift.

Cohort AnalysisRetention CurveD1 RetentionD7 Retention+8
Intermediate
6 questions
Product Analytics33 min

Funnel Analysis: Conversion Optimization, Drop-off Attribution, and Funnel SQL

Funnel analysis measures how users progress through a sequence of steps — signup → activation → first purchase → retention. It is the most frequently tested analytics framework in product data science interviews at Meta, Airbnb, Stripe, and Lyft. This guide covers funnel construction, drop-off rate calculation, multi-touch attribution, and the SQL patterns for session-level and user-level funnels.

Funnel AnalysisConversion RateDrop-off RateUser Acquisition+8
Intermediate
6 questions
Scenarios32 min

How to Approach Data & Product Scenario Questions

Scenario questions — DAU drops, model degradation, A/B test decisions, incident triage — are the dominant interview format for senior data scientists, product analysts, and ML engineers. This guide gives you the DIAGNOSE framework: a structured 5-step approach that transforms vague 'what would you do?' prompts into clear, systematic answers that impress interviewers at Meta, Google, Airbnb, and Stripe.

Scenario QuestionsProduct AnalyticsDAU DropMetric Investigation+8
Intermediate
6 questions
Scenarios45 min

Scenario Walkthrough: Why Is DAU Dropping?

End-to-end DAU triage the way strong DS/PM orgs run it: lock the metric definition, validate instrumentation before narrative, decompose the composite into new vs returning and session depth, segment to localize a sharp break, and ship actions with impact sizing and explicit uncertainty. Includes worked timing, SQL-shaped diagnostics, and the Simpson / mix-shift traps that sink mid-level answers.

DAUWAUMetric Root CauseSegmentation+11
Intermediate
9 questions
Scenarios45 min

Scenario Walkthrough: Engagement vs Revenue — Guardrails & Horizon

The highest-signal PM/DS tradeoff: a surface, ranking, or growth lever lifts a leading engagement input while threatening RPM, ARPU, chargebacks, or long-horizon retention. Learn to express a *constrained* objective, pre-register guardrails, separate short-window novelty from LTV, and run readouts the way strong experimentation orgs do—not as a single p-value on one chart.

North Star MetricGuardrailsOECARPU+11
Intermediate
8 questions
Scenarios45 min

Scenario Walkthrough: Marketplace Supply–Demand Imbalance — Liquidity First

Interview-prep walkthrough for two-sided marketplaces: liquidity before GMV, cell-level supply and demand, search-to-fill, zero results, and dual-sided pain (riders and drivers can disagree in the same city). Covers levers, incentives, repositioning, throttles, and why network effects break naive A/B designs.

MarketplaceTwo-Sided PlatformLiquiditySupply Demand Balance+10
Intermediate
1 questions
Scenarios45 min

Scenario Walkthrough: Post-Launch — Was This Feature a Success?

Structured readout for 'we shipped' moments: pre-registered success criteria vs ad hoc happy charts, leading vs lagging metrics, holdout and cannibalization, seasonality, and the governance of when to call a win. Mirrors how strong DS orgs run launch reviews (Meta, Google, Amazon) so you sound like you have owned a launch, not like you cherry-picked a green dashboard tile.

Post-Launch ReviewLaunch ReadoutSuccess MetricsHoldout Group+10
Intermediate
1 questions
SQL & Databases35 min

SQL Query Optimization: Indexes, Query Plans, and Performance at Scale

Production SQL is not about writing queries that work — it's about queries that scale. This guide covers B-tree and hash index mechanics, reading EXPLAIN ANALYZE output, index selectivity and covering indexes, when indexes hurt write throughput, correlated subquery elimination, and partitioning strategies used in analytics warehouses at scale.

SQL OptimizationIndexesB-tree IndexQuery Plan+8
Intermediate
1 questions
SQL & Databases40 min

Window Functions for Analytics: ROW_NUMBER, RANK, LAG/LEAD, and Running Totals

Window functions are the single most-tested SQL topic in data and ML engineer interviews. This guide covers the full OVER() clause mechanics, every ranking function and when each is wrong, LAG/LEAD for time-series analysis, running aggregates, and the ROWS vs RANGE frame clause distinction that trips up even experienced engineers.

Window FunctionsROW_NUMBERRANKDENSE_RANK+8
Intermediate
1 questions
SQL & Databases35 min

SQL Joins & Aggregations: INNER, LEFT, GROUP BY, and Aggregation Pitfalls

Joins and GROUP BY form the backbone of every SQL interview. This guide covers the four join types with precise NULL semantics, GROUP BY with HAVING, the COUNT(*) vs COUNT(col) distinction that fails most candidates, multi-table aggregation patterns, and the self-join technique for hierarchical and comparison queries asked at Meta, Stripe, and Airbnb.

SQL JoinsINNER JOINLEFT JOINGROUP BY+8
Intermediate
7 questions
SQL & Databases38 min

Subqueries and CTEs: WITH Clauses, Correlated Subqueries, and Recursive Patterns

Subqueries and CTEs are the SQL constructs that separate mid-level analysts from senior engineers. This guide covers scalar subqueries, correlated subqueries and their performance implications, the WITH (CTE) clause for readable multi-step logic, and recursive CTEs for hierarchical data — all tested at Meta, Stripe, Google, and Airbnb's data rounds.

SubqueriesCTEWITH ClauseCorrelated Subquery+8
Intermediate
1 questions
Product Analytics38 min

Attribution Modeling: Last-Touch, Multi-Touch, Shapley, MMM, and Incrementality

Marketing and growth interviews ask how you credit conversions to channels — last-click is biased, multi-touch rules are arbitrary, and MMM is slow. This guide covers Markov attribution, Shapley-Owen values, geo holdouts for ground-truth incrementality, and how Uber-style teams pair fast digital attribution with slower marketing mix models.

AttributionMulti-Touch AttributionShapley ValueMarketing Mix Model+8
Advanced
1 questions
Scenarios45 min

Scenario Walkthrough: The A/B Test Went Wrong — SRM, Peeking, and Interference

When the experiment is lying but the slide deck is green. Walk the failure modes that dominate production: sample ratio mismatch, peeking and early stopping, novelty and learning effects, network interference, wrong randomization unit, and thin-event metrics read too early. Teaches the remediation playbook: invalidate, debug assignment, or redesign — and how to say that without panicking the room. Grounded in standard OCE practice, chi-square SRM, and the Kohavi, Tang, and Xu body of work on trustworthy experiments.

A/B TestingSample Ratio MismatchSRMPeeking Problem+10
Advanced
1 questions
Scenarios40 min

Scenario Walkthrough: Recommendation Model CTR Dropped 15% Overnight

A structured incident walkthrough for sudden ML metric degradation. Learn how to separate feature drift, label drift, training-serving skew, serving regressions, and upstream schema failures with fast falsification checks and executive-safe communication.

ML IncidentModel DegradationCTR DropFeature Drift+5
Advanced
1 questions
Scenarios35 min

Scenario Walkthrough: Payment Service Returning 500s in Production

A step-by-step incident response walkthrough for severe production outages. Covers triage order, dependency isolation, rollback decisions, cascading failure containment, and stakeholder communication under time pressure.

System OutageIncident ResponsePayment SystemsCascading Failure+5
Advanced
1 questions
Scenarios45 min

Scenario Walkthrough: Trust & Safety Escalation — Abuse Signals & Response

When bad actors spike, reports flood, or a model misfires, strong answers triage threat class, validate measurement, use velocity and graph signals for coordination, respect human review capacity, and govern precision versus harm — not generic retrain-the-model talk. Maps signals to tiered actions, appeals, and guard metrics the way integrity orgs actually run operations.

Trust and SafetyIntegrityAbuse DetectionVelocity Rules+10
Advanced
1 questions
SQL & Databases35 min

SQL Indexes and Query Performance: B-Tree, Composite, and Covering Indexes

Index selection is what separates a working SQL query from one that runs in production. This guide covers B-tree index mechanics, composite index column ordering, covering indexes for index-only scans, when indexes hurt writes, and how to read EXPLAIN plans — skills tested at Google, Stripe, and Meta for senior data and backend roles.

SQL IndexesB-Tree IndexComposite IndexCovering Index+8
Advanced
1 questions