Skip to main content

Part III (g) - Tradeoff Calculus for System Design

HARD TRUTH: ARCHITECTURE IS TRADEOFFS, NOT DIAGRAMS

War-story pattern: teams with beautiful diagrams still melt in production because no one wrote down the tradeoffs.

Top engineers make tradeoffs explicit before code starts. They decide what to optimize now, what to defer, and what to protect at all costs.

Your job is not to maximize every dimension. Your job is to pick the right losses deliberately.


THE FIVE-AXIS TRADEOFF MODEL

For each major architecture decision, score each option on:

  • Latency
  • Consistency
  • Cost
  • Complexity
  • Team velocity

Use -2 to +2:

  • +2: strong advantage
  • +1: moderate advantage
  • 0: neutral
  • -1: moderate cost
  • -2: heavy cost

Example decision: synchronous write path vs async write with queue.

  • Sync path: better consistency, worse latency and resilience under spikes.
  • Async path: better spike handling, more complexity and eventual consistency risk.

The score does not make the decision for you. It forces you to see the real shape of the decision.


DECISION MATRIX TEMPLATE

Use one matrix per major decision:

  • Decision name
  • Decision owner
  • Date and review date
  • Constraints and assumptions
  • Options considered
  • Five-axis scores
  • Risks and mitigations
  • Reversal cost
  • Final decision and rationale

This converts design discussion from opinion to traceable reasoning.

If six months later the decision looks wrong, the matrix tells you whether assumptions changed or execution failed.


DEFAULT PATTERNS FOR GOOD JUDGMENT

Use these defaults unless data forces a different move:

  • Prefer reversible decisions early.
  • Keep complexity localized, not distributed across all services.
  • Buy strong consistency only where correctness demands it.
  • Optimize reliability before micro-optimizing peak performance.
  • Keep ownership boundaries clear so teams can evolve independently.

Field rule: defaults are not dogma, but they save teams from expensive early mistakes when uncertainty is high.


FAILURE MODE CHECK

Before finalizing a design, run a quick failure pre-mortem:

  • Top 3 failure modes
  • Earliest detection signal for each
  • Blast radius estimate
  • Containment and rollback strategy
  • Owner during incident

If this section is weak, your design is not production-ready yet.

A design without failure thinking is an optimistic diagram, not an operating system.


CHANGE TRIGGERS

Every decision must include trigger conditions for re-evaluation:

  • Traffic or data growth threshold crossed
  • SLO misses for two consecutive releases
  • Cost exceeds budget by agreed percentage
  • Incident frequency increases beyond baseline
  • Team cognitive load blocks delivery

This prevents two classic failure modes:

  • Keeping a design too long after constraints changed
  • Re-architecting too early without evidence

War-Story Mini-Case: Redis Everywhere Backfired

Timeline:

  • Week 0: Team adds Redis caching to almost every read endpoint; median latency drops from 280ms to 90ms.
  • Week 2: Stale-order incidents appear in checkout and order-history flows.
  • Week 3: Incident review shows invalidation logic duplicated across four services.
  • Week 4: Cache scope reduced to true hot paths; invalidation ownership moved to one boundary service.
  • Week 6: Trigger policy added: design review required if p95 > 220ms before broad caching changes.

Key decisions:

  • Rejected broad cache expansion in favor of bounded cache ownership.
  • Prioritized correctness over benchmark-driven latency wins.
  • Added explicit re-evaluation trigger to prevent reactive architecture changes.

Outcome:

  • p95 settled at 130ms.
  • Correctness incidents dropped sharply, with predictable cache behavior.

OUTPUT ARTIFACT

For every major architecture decision, publish:

  • One-page Architecture Decision Matrix
  • ADR with rationale and alternatives
  • Failure mode check summary
  • Review date and change triggers

If you consistently produce these artifacts, architecture quality compounds release after release.