Skip to main content

Part XI (d) - Platform Migrations & Deprecation Strategy

HARD TRUTH: MIGRATIONS FAIL WHEN TREATED AS REFACTORS

Platform migrations fail when teams treat them as isolated refactors.

In reality, migrations are long-running change programs with technical, operational, and organizational risk.

Success means business continuity while risk decreases over time.


MIGRATION TYPES AND RISK SHAPES

Common migration classes:

  • Monolith to modular services
  • Legacy data store replacement
  • API version migration
  • Runtime or cloud platform migration

Each class has different dominant risks:

  • Data migrations: correctness risk
  • API migrations: consumer coordination risk
  • Runtime migrations: operational behavior risk

Field rule: identify risk shape first, then plan.


STRANGLER EXECUTION MODEL

Preferred model for large migrations:

  • Define stable domain boundaries
  • Route a small percentage of traffic
  • Compare behavior old versus new
  • Increase traffic in controlled stages
  • Decommission legacy paths in phases

This model limits blast radius and preserves rollback options.

Failure pattern: big-bang migrations remove optionality and amplify incident probability.


DATA MIGRATION SAFETY

Data is usually the highest-risk layer.

Use controls:

  • Backfill in deterministic batches
  • Reconciliation checks against source of truth
  • Dual-read or dual-write only with explicit consistency plan
  • Roll-forward and rollback strategies per stage

If correctness cannot be measured, migration progress is fiction.


ROLLOUT GOVERNANCE

Run migration like a program:

  • Milestones with go/no-go gates
  • Named owner per milestone
  • Risk register with severity and mitigation
  • Escalation conditions and communication plan

Governance is not bureaucracy. It is how you prevent high-cost surprises.


DEPRECATION DISCIPLINE

Every migration needs deprecation management:

  • Announced sunset timeline
  • Consumer communication plan
  • Compatibility window
  • Objective removal criteria

Unmanaged deprecation creates permanent legacy support burden.

Clear timelines and contracts reduce drag.


MIGRATION RFC (COPYABLE TEMPLATE)

Use a short RFC for every non-trivial migration:

# Migration RFC: [System / Boundary]

## Problem
What risk, cost, or capability gap makes this migration necessary?

## Scope
- In scope:
- Out of scope:

## Migration Shape
- Type: data | API | runtime | platform
- Strategy: strangler | shadow | dual-run | phased cutover

## Success Criteria
- Technical:
- User / business:

## Parity Checks
- What old vs new behavior must match?
- How will parity be measured?

## Rollout Plan
- Wave 1:
- Wave 2:
- Wave 3:

## Rollback Plan
- Trigger:
- Action:
- Data caveats:

## Consumer Plan
- Who is affected?
- Communication channel:
- Sunset date:

## Owners
- Migration owner:
- Approver:

Field rule: if the migration cannot be summarized in this shape, the team does not understand it yet.


PARITY & BACKFILL CHECKLIST

Before each migration stage, verify:

  • Old and new paths are both observable
  • Backfill runs in deterministic batches
  • Reconciliation query is defined before data moves
  • A mismatch threshold is defined for stop-the-line behavior
  • Roll-forward and rollback are both possible for this stage

Example parity table:

CheckSourceTargetPass condition
Billing totalLegacy serviceNew service100% match on sample and shadow traffic
Discount rulesLegacy serviceNew serviceNo mismatch above agreed threshold
Event emissionLegacy queueNew queueSame count and key fields

DECOMMISSION CHECKLIST

Legacy systems should not linger because the team got tired near the end.

## Decommission Checklist
- [ ] 100% traffic on new path for agreed window
- [ ] No critical incidents during stabilization window
- [ ] Consumer compatibility window closed
- [ ] Sunset communication sent
- [ ] Dashboards switched to new source of truth
- [ ] Runbooks updated
- [ ] Legacy writes disabled
- [ ] Legacy read path removed
- [ ] Cost / infra cleanup complete

Field rule: a migration is not complete when traffic moves. It is complete when the legacy path is safely removed.


War-Story Mini-Case: Big-Bang Migration Near Miss

Timeline:

  • T-21 days: Team plans one-shot cutover for billing service migration.
  • T-6h: Pre-launch validation finds schema mismatch in discount-calculation path.
  • T-4h: Risk review marks launch No-Go; big-bang cutover canceled.
  • Week 1: Migration reframed into strangler phases with 5% traffic canary.
  • Week 3: Shadow reads and parity checks detect two transformation defects before customer impact.
  • Week 6: Traffic reaches 100% on new service; legacy path enters deprecation window.

Key decisions:

  • Rejected schedule pressure in favor of controlled rollout.
  • Added parity gates as hard criteria for phase progression.
  • Separated data-migration validation from API routing changes.

Outcome:

  • Migration completed over six weeks with no customer-facing outage.
  • Legacy component decommissioned after compatibility window closed.

OUTPUT ARTIFACT

Migration baseline artifact set:

  • Migration RFC
  • Milestone plan with gates
  • Parity / reconciliation checklist
  • Risk register
  • Consumer communication plan
  • Decommission checklist

These artifacts make multi-quarter migration work tractable, auditable, and survivable.