CASE STUDY — Part I: Debugging With a Timeline (Race Condition in Production)
SCENARIO
A signup flow intermittently creates two user records for the same email.
-
Happens ~0.2% of the time
-
Almost always during traffic spikes
-
Support sees “account already exists” errors + duplicated welcome emails
This is a senior-level bug because it’s not “logic.” It’s time.
THE TIMELINE MODEL (DON’T TOUCH CODE YET)
Reconstruct the timeline:
Key question:
- Are there any atomic guarantees between “check existing” and “insert”?
ROOT CAUSE (THE SYSTEM ALLOWS IT)
The system relies on an application-level check:
-
SELECT ... WHERE email = x -
then
INSERT
Under concurrency, both requests pass the check.
Senior rule:
If correctness depends on timing, it’s already broken.
FIX AT THE CORRECT LAYER (INVARIANT LAYER)
Invariant:
- Email must be unique.
Correct fix:
- enforce a unique constraint at the database layer
Then define behavior:
- if insert fails with unique violation → return deterministic
409 CONFLICTwith typed errorEMAIL_ALREADY_EXISTS
UX + CONTRACT (DON’T LIE TO USERS)
UI changes:
-
disable submit while request in-flight
-
if
EMAIL_ALREADY_EXISTS:-
show “account exists, try login”
-
optionally offer magic link
-
Important:
- do not show “signup success” until server confirms creation
PREVENT THE NEXT CLASS OF FAILURES
The welcome email should be idempotent.
Patterns:
-
transactional outbox
-
or “send welcome email” job keyed by
userId(dedupe)
WHAT TO MEASURE
-
count unique-constraint violations (should spike during deploys/spikes)
-
duplicate welcome email events (should go to ~0)
-
signup error rate by typed reason
EXERCISE
Write a one-page postmortem:
-
what was the violated invariant?
-
what layer should have enforced it?
-
what changed in UI contracts?