Scenario-based exercises for identifying the seven most common A/B experiment design failures: insufficient power, wrong duration, novelty effects, network effects contaminating the control group, multiple testing without correction, surrogate metric selection, and holdout contamination. Tested at senior IC and staff levels at Meta, Google, Netflix, Airbnb, and Booking.com.

50 min read 2 sections 1 interview questions

A/B TestingExperimentationStatistical PowerNovelty EffectNetwork EffectsSUTVAMultiple TestingP-valueSample SizeSurrogate MetricsCausal InferenceExperiment DesignOnline MetricsHoldout

The Scenario

Your PM presents this experiment design for a new checkout flow:

"We will run the test for 3 days, 50/50 split. Primary metric is DAU. We are testing 4 different variants simultaneously. The checkout page is shared between logged-in users and guests, and those users see each other's cart counts in the UI. If p < 0.05, we ship."

Your job: Find every flaw in this design before it runs.

This is a real interview scenario used at Meta, Airbnb, Booking.com, and Google. The interviewer is not looking for you to list "things that can go wrong with A/B tests" — they want you to identify the specific flaws in this design and explain exactly how each one would corrupt the results.

There are at least 6 distinct problems in the design above. A senior engineer should find all 6. A staff engineer should also propose how to fix each one.

IMPORTANT

Premium content locked

This guide is premium content. Upgrade to Pro to unlock the full guide, quizzes, and interview Q&A.

Upgrade to Pro Sign in to upgrade