Preview — Pro guide
You are seeing a portion of this guide. Sign in and upgrade to unlock the full article, quizzes, and interview answers.
Sections
Related Guides
A/B Testing & Experimentation at Scale
Machine Learning
A/B Testing for ML Systems: Design, Statistical Rigor & Production Pitfalls
ML System Design
Metric Anomaly Triage: Is This a Real Problem or an Instrumentation Bug?
Production Engineering
Feature Flags: Safe Rollouts, Kill Switches, and the Dark Launch Pattern
Production Engineering
ML Model Evaluation & Production Monitoring: Shadow Mode, A/B Testing & Rollback
ML System Design
A/B Test Critique: Finding Flaws in Experiment Designs
Scenario-based exercises for identifying the seven most common A/B experiment design failures: insufficient power, wrong duration, novelty effects, network effects contaminating the control group, multiple testing without correction, surrogate metric selection, and holdout contamination. Tested at senior IC and staff levels at Meta, Google, Netflix, Airbnb, and Booking.com.
The Scenario
Your PM presents this experiment design for a new checkout flow:
"We will run the test for 3 days, 50/50 split. Primary metric is DAU. We are testing 4 different variants simultaneously. The checkout page is shared between logged-in users and guests, and those users see each other's cart counts in the UI. If p < 0.05, we ship."
Your job: Find every flaw in this design before it runs.
This is a real interview scenario used at Meta, Airbnb, Booking.com, and Google. The interviewer is not looking for you to list "things that can go wrong with A/B tests" — they want you to identify the specific flaws in this design and explain exactly how each one would corrupt the results.
There are at least 6 distinct problems in the design above. A senior engineer should find all 6. A staff engineer should also propose how to fix each one.