Skip to main content

📘 CASE STUDY — Part I: Debugging With a Timeline (Race Condition in Production)

SECTION 0 — SCENARIO

A signup flow intermittently creates two user records for the same email.

  • Happens ~0.2% of the time

  • Almost always during traffic spikes

  • Support sees “account already exists” errors + duplicated welcome emails

This is a senior-level bug because it’s not “logic.” It’s time.


SECTION 1 — THE TIMELINE MODEL (DON’T TOUCH CODE YET)

Reconstruct the timeline:

Request A: POST /signup(email=x)
Request B: POST /signup(email=x)

A: validate -> check existing -> insert user -> send email
B: validate -> check existing -> insert user -> send email

Key question:

  • Are there any atomic guarantees between “check existing” and “insert”?

SECTION 2 — ROOT CAUSE (THE SYSTEM ALLOWS IT)

The system relies on an application-level check:

  • SELECT ... WHERE email = x

  • then INSERT

Under concurrency, both requests pass the check.

Senior rule:

If correctness depends on timing, it’s already broken.


SECTION 3 — FIX AT THE CORRECT LAYER (INVARIANT LAYER)

Invariant:

  • Email must be unique.

Correct fix:

  • enforce a unique constraint at the database layer

Then define behavior:

  • if insert fails with unique violation → return deterministic 409 CONFLICT with typed error EMAIL_ALREADY_EXISTS

SECTION 4 — UX + CONTRACT (DON’T LIE TO USERS)

UI changes:

  • disable submit while request in-flight

  • if EMAIL_ALREADY_EXISTS:

    • show “account exists, try login”

    • optionally offer magic link

Important:

  • do not show “signup success” until server confirms creation

SECTION 5 — PREVENT THE NEXT CLASS OF FAILURES

The welcome email should be idempotent.

Patterns:

  • transactional outbox

  • or “send welcome email” job keyed by userId (dedupe)


SECTION 6 — WHAT TO MEASURE

  • count unique-constraint violations (should spike during deploys/spikes)

  • duplicate welcome email events (should go to ~0)

  • signup error rate by typed reason


SECTION 7 — EXERCISE

Write a one-page postmortem:

  • what was the violated invariant?

  • what layer should have enforced it?

  • what changed in UI contracts?


🏁 END — PART I CASE STUDY