Preview — Pro guide
You are seeing a portion of this guide. Sign in and upgrade to unlock the full article, quizzes, and interview answers.
Multiple Testing Corrections: FWER, FDR, Bonferroni, Benjamini–Hochberg, and When Each Fails
Running twenty metrics at α=0.05 each does not leave your program at a 5% false positive rate — family-wise error explodes. This guide covers Bonferroni, Holm, Benjamini–Hochberg FDR, false discovery proportion intuition, and how Meta-style experimentation teams pair primary-metric discipline with exploratory FDR on secondary reads.
The Family-Wise Error Explosion
Suppose you test independent null hypotheses, each at level . If **all** nulls are true, the probability of **at least one** false rejection is . For and , that is about **40%** — not 5%. Product experiments violate independence (metrics co-move), but the qualitative lesson survives: **naive multi-metric dashboards** generate phantom wins. Interviewers want you to separate three objects: (1) a **pre-registered primary** metric for ship decisions, (2) **FWER-controlled** procedures when any single false alarm is catastrophic, (3) **FDR-controlled** procedures for exploratory science on many hypotheses.