Preview — Pro guide
You are seeing a portion of this guide. Sign in and upgrade to unlock the full article, quizzes, and interview answers.
Sections
Related Guides
A/B Testing & Experimentation at Scale
Machine Learning
Causal Inference: DiD, Instrumental Variables, RDD, and When A/B Tests Fail
Machine Learning
Root Cause Analysis Framework: Investigating Metric Drops and Production Incidents
Machine Learning
A/B Test Critique: Finding Flaws in Experiment Designs
Production Engineering
Metric Design for Data Scientists: North Star Metrics, Guardrails, and Causal Attribution
Master the 3-layer metric hierarchy used at top tech companies — from selecting a North Star and guardrails to diagnosing metric drops through top-down decomposition. Covers Goodhart's Law, Simpson's Paradox, and the classic trade-off questions that appear in every DS and PM+data interview.
The Metric Trap: Why Most DS Candidates Fail This Question
"Design a metric system for Feature X" sounds like a free-form brainstorm. It isn't. Interviewers are checking whether you know the hierarchy — the structured relationship between a single North Star metric, a handful of guardrails, and the diagnostic layer beneath.
The 6/10 answer is a flat list: "I'd track DAU, MAU, revenue, session length, NPS..." No structure, no trade-offs, no discussion of what happens when these metrics conflict.
The 9/10 answer opens with: "I'd anchor on one North Star metric, define 3–5 guardrails that cannot regress, and then choose diagnostic metrics to explain why the North Star moves." Then it discusses why each metric was chosen and what would cause you to deprioritize one over another.
This framework — used at Meta (north star = DAU; guardrails = ad load per user), Airbnb (north star = nights booked; guardrails = host/guest satisfaction), and Lyft (north star = rides per user; guardrails = wait time, cancellation rate) — is what structured metric design looks like in practice.
What Interviewers Actually Evaluate
The 3 things interviewers score you on in metric design questions:
- Hierarchy — Do you distinguish North Star from guardrails from diagnostics, or is it a flat list?
- Trade-off reasoning — When DAU goes up but revenue goes down, can you articulate why that might be expected or unexpected, and what decision it implies?
- Gaming resistance — Do you anticipate how the metric could be artificially moved, and build in protections (composite metrics, audits, guardrails on the guardrail)?
A candidate who mentions Goodhart's Law by name, gives a concrete example, and proposes a mitigation will stand out at every company. Most candidates don't.