Craft & judgment
Written scenarios for how you operate under pressure: incidents, debt, influence, and product tradeoffs. Deep-dive guides live under Learn → Craft; this page is rubric-based practice only.
10% free preview. Sign in free to unlock 30%, or upgrade for full access.
P0 Production Incident: Payment Service Down
It's 2:47am. PagerDuty wakes you: payment success rate dropped from 99.2% to 31% 8 minutes ago. You are the on-call engineer. Walk through your response from alert to resolution.
Technical Debt vs. Feature Velocity: Make the Call
Your team's monolithic authentication service has grown to 80k lines of code over 5 years. It takes 4 hours to deploy, has no test coverage, and has caused 3 incidents in 6 months — each taking 4+ hours to resolve. Your PM wants to ship 3 new auth features this quarter. How do you frame the refactor vs. feature build decision?
Driving a Cross-Team Technical Decision Without Authority
Your team needs another team to adopt a new API versioning standard you've designed — without which your platform roadmap is blocked. The other team's lead is skeptical and their roadmap is already full. You have no authority over them. How do you get this done?
Write a Blameless Postmortem for a Database Outage
Your team's Postgres primary was accidentally deleted by a script that ran in production instead of staging. Service was down for 47 minutes. Write a blameless postmortem that (1) describes what happened, (2) identifies root causes (plural), (3) defines action items, and (4) does not assign blame to the engineer who ran the script.
Ship or Delay? A Feature with Known Bugs
You're 2 days from a committed launch of a new checkout flow. QA found 3 bugs: (1) a rare race condition that causes double charges in ~0.3% of transactions, (2) a UI misalignment on older Android devices (8% of users), (3) a missing 'order confirmation' email on the first purchase. Your PM and VP want to ship on time. What do you recommend and how do you defend it?