Sections

Scenario questions — DAU drops, model degradation, A/B test decisions, incident triage — are the dominant interview format for senior data scientists, product analysts, and ML engineers. This guide gives you the DIAGNOSE framework: a structured 5-step approach that transforms vague 'what would you do?' prompts into clear, systematic answers that impress interviewers at Meta, Google, Airbnb, and Stripe.

32 min read 9 sections 6 interview questions

Scenario QuestionsProduct AnalyticsDAU DropMetric InvestigationRoot Cause AnalysisA/B TestingIncident ResponseData Science InterviewsFrameworkDIAGNOSE FrameworkSenior InterviewDecision Making

What Scenario Questions Are Testing

Scenario questions give you an open-ended business situation and ask how you'd respond. Unlike SQL or ML theory questions, there's no single correct answer. The interviewer is evaluating your reasoning process, not just your conclusion.

Common scenario formats:

Metric drop investigation: "Our DAU dropped 15% overnight. Walk me through your investigation."
Model degradation: "The fraud model's precision fell from 92% to 71% in production. What do you do?"
Ship-or-not decision: "The A/B test shows a 3% lift in checkout completions but a 1% drop in NPS. Do you ship?"
Incident triage: "We have a P0 incident. The cart service is returning 500s. Walk me through your response."
Feature design: "Design a metric system for the content recommendation team."

What the interviewer is evaluating:

Structure: do you have a systematic approach, or do you guess randomly?
Hypotheses before data: do you form a hypothesis tree before pulling queries?
Breadth: do you consider multiple explanations (data pipeline, external factors, product change, fraud)?
Business judgment: do you connect your findings to a business recommendation?
Communication: can you narrate your thinking out loud clearly?

Scenario questions are one of the highest-signal interview formats because they can't be memorized. They test judgment developed through real analytical experience.

IMPORTANT

What a 9/10 Answer Looks Like vs a 6/10 Answer

6/10 answer: "I'd look at the data, check if there was a deployment, maybe look at the database logs, and then figure out what happened."

9/10 answer: "I'd start by confirming the signal is real — check the data pipeline and compare across time zones and platforms to rule out instrumentation issues. If real, I'd build a hypothesis tree: internal causes (deployment, feature flag, data bug) vs external causes (competitor launch, seasonality, platform event). I'd prioritize hypotheses by probability × impact, pull the most diagnostic query first, then narrow down. Once I have the root cause, I'd quantify the user impact and recommend a remediation path."

The difference: the 9/10 answer follows a structure, forms hypotheses before querying, eliminates data quality issues first, and connects findings to a business recommendation.

The DIAGNOSE Framework for Scenario Questions

Rendering diagram...

Step 1 & 2: Define the Signal and Check the Instrument

Before investigating causes, get precise about what changed.

Define the signal (D):

What metric dropped/spiked? (DAU, revenue, conversion rate, model precision)
By how much? Is 15% drop large or small for this metric's normal variance?
When did it start? Gradually over days, or a step change at a specific time?
Which segments are affected? All users or a subset? All platforms or one?
What is the normal baseline? Is this a seasonal trough or a genuine drop?

Check the instrument (I) — always do this first: Approximately 30–40% of "metric drops" in production are actually data pipeline failures, not real product changes. Check this before investigating business causes.

Instrument checks to run:

Is the logging system healthy? Check event ingestion lag, error rates in the pipeline.
Compare across data sources: does BigQuery show the same drop as the Kafka consumer?
Check recent deployments to the logging/analytics layer specifically.
Look at upstream dependencies: did the app stop sending events or did events start failing?
Check if the issue is time-zone localized (common with UTC vs local time bugs).

If the instrument check passes (data is trustworthy), proceed. If it fails, the "incident" is a data pipeline bug — escalate to data engineering and scope the data gap.

Step 3 & 4: Assemble Hypotheses and Generate Queries

Form a complete hypothesis tree before writing a single query. This is the most differentiating step — it shows you think before you act.

Standard hypothesis tree for a metric drop:

Internal causes (things you control): Product changes: new feature, A/B test launched, UI change, paywall added Engineering changes: deployment, config change, feature flag flipped Data/instrumentation: logging bug, pipeline outage, schema change Algorithm changes: ranking model update, recommendation change

External causes (things outside your product): Seasonality: day of week, holiday, back-to-school, product cycle Competitor action: competitor launched a feature, had an outage (your users migrating back) Platform change: iOS/Android OS update, App Store policy, browser update Macroeconomic: news event, regulatory change, market disruption

Prioritize hypotheses before querying:

"We deployed a new checkout flow 3 hours before the drop" → investigate that first
"It's a Sunday in Q4" → check if this matches historical Sunday patterns first
"The logging team had a config change yesterday" → check instrument first

Generating diagnostic queries:

Segment by platform (iOS vs Android vs web) — if drop is iOS-only, look at iOS deployment
Segment by user type (new vs returning) — if drop is new-user-only, look at acquisition funnel
Segment by geography — if drop is US-only, look for US-specific events
Compare to same time last week/month — is this seasonal?
Funnel analysis: where in the conversion path are users dropping off?

Step 5 & 6: Narrow to Root Cause and Own the Recommendation

Narrowing to root cause: use a process of elimination, not trial and error.

If segmentation shows: iOS drop = 40%, Android drop = 2%, Web drop = 1% → This is iOS-specific. Investigate iOS deployment, iOS version update, App Store listing.

If funnel analysis shows: signups unchanged, but checkout completions dropped 20% → The drop is in the checkout step. Investigate checkout changes, payment provider.

If cohort comparison shows: new users dropped, returning users unchanged → The drop is in acquisition or first-session experience. Investigate ad campaigns, onboarding flow, landing pages.

Once you have the root cause, quantify the impact:

How many users are affected? (X% of DAU × user base size)
What is the revenue impact? ($Y per affected user × affected user count)
Is this P0 (system down), P1 (major regression), or P2 (minor regression)?

Own a recommendation:

If it's a bug: "Roll back the deployment immediately. Estimated fix time is X hours. Estimated impact at current rate is $Y/hour of continued outage."
If it's a feature regression: "Disable the feature flag. Scope the full user impact before re-enabling. Schedule post-mortem."
If it's external/seasonal: "No action required. Set up an alert for the same pattern next season. Update the DAU forecast model."
If it's still unclear: "I need 2 more days of data to confirm. In the meantime, I'd monitor hourly and escalate if the drop accelerates."

Scenario Type → Tailored Approach

Scenario Type	First Check	Key Hypotheses	Diagnostic Segmentation	Recommendation Type
DAU / engagement drop	Data pipeline integrity	Deploy, seasonality, competitor, logging bug	Platform, geo, user cohort, time of day	Rollback / investigate / monitor
Model performance drop	Score distribution shift (data drift)	Feature drift, label drift, upstream data change, model bug	Feature importance, slice analysis, recent data	Retrain / rollback / alert
Ship-or-not A/B test	Experiment validity (SRM, novelty effects)	Primary metric lift vs counter-metric drop, long-term effects	User segments, device types, exposure duration	Ship / iterate / don't ship + why
P0 incident	Service health (error rates, latency, on-call alerts)	Recent deploys, dependency failures, traffic spikes	Error codes, affected endpoints, dependency SLOs	Rollback / scale / escalate + timeline
Metric design	Existing metrics coverage and gaps	Leading vs lagging indicators, gaming risk, sensitivity	User journey stages, team ownership boundaries	Define guardrail + northstar + diagnostic metrics

⚠ WARNING

Common Mistakes in Scenario Questions

Mistake 1: Jumping to queries before forming hypotheses. "I'd query DAU by platform" is not structured — it's a guess. The interviewer wants to see you build a hypothesis first, then explain why that segmentation tests it.

Mistake 2: Not checking the data pipeline first. About 30–40% of apparent metric drops are instrumentation failures. Skipping this step makes you look like you've never dealt with production data.

Mistake 3: Stopping at root cause without a recommendation. "The checkout drop was caused by the new address validation modal" is incomplete. The answer is: "...so I'd recommend rolling back the modal, validating address client-side instead, and re-running the A/B test with the fix."

Mistake 4: Proposing only one hypothesis. Experienced analysts know there are always multiple potential causes. Listing 3–5 hypotheses before narrowing shows breadth.

Mistake 5: Ignoring uncertainty. If you're not sure, say so: "I'd want to see another 24 hours of data before concluding this is a real drop vs. noise." Acknowledging uncertainty is a sign of statistical maturity, not weakness.

TIP

Scenario Question Interview Template

Opening (30 seconds): Restate the problem in your own words. Clarify the magnitude and timeline. "So DAU dropped 15% starting yesterday at 5pm — is that 15% relative to the prior day or the weekly average? And which platforms are we seeing this on?"

Step through DIAGNOSE out loud:

"First I'd verify the data is accurate — check the pipeline..."
"Assuming data is good, here's my hypothesis tree..."
"I'd start with the deployment hypothesis because it's highest probability. The query I'd run is..."
"Based on that, I'd narrow to... and my recommendation is..."
"Open questions I'd want to validate are..."

Time management: spend ~30% of the interview on hypothesis formation, ~40% on investigation logic, ~30% on recommendation. Don't get so deep into one hypothesis that you never make a recommendation.

Interview Questions

Click to reveal answers