Part XI (d) - Platform Migrations & Deprecation Strategy
HARD TRUTH: MIGRATIONS FAIL WHEN TREATED AS REFACTORS
Platform migrations fail when teams treat them as isolated refactors.
In reality, migrations are long-running change programs with technical, operational, and organizational risk.
Success means business continuity while risk decreases over time.
MIGRATION TYPES AND RISK SHAPES
Common migration classes:
- Monolith to modular services
- Legacy data store replacement
- API version migration
- Runtime or cloud platform migration
Each class has different dominant risks:
- Data migrations: correctness risk
- API migrations: consumer coordination risk
- Runtime migrations: operational behavior risk
Field rule: identify risk shape first, then plan.
STRANGLER EXECUTION MODEL
Preferred model for large migrations:
- Define stable domain boundaries
- Route a small percentage of traffic
- Compare behavior old versus new
- Increase traffic in controlled stages
- Decommission legacy paths in phases
This model limits blast radius and preserves rollback options.
Failure pattern: big-bang migrations remove optionality and amplify incident probability.
DATA MIGRATION SAFETY
Data is usually the highest-risk layer.
Use controls:
- Backfill in deterministic batches
- Reconciliation checks against source of truth
- Dual-read or dual-write only with explicit consistency plan
- Roll-forward and rollback strategies per stage
If correctness cannot be measured, migration progress is fiction.
ROLLOUT GOVERNANCE
Run migration like a program:
- Milestones with go/no-go gates
- Named owner per milestone
- Risk register with severity and mitigation
- Escalation conditions and communication plan
Governance is not bureaucracy. It is how you prevent high-cost surprises.
DEPRECATION DISCIPLINE
Every migration needs deprecation management:
- Announced sunset timeline
- Consumer communication plan
- Compatibility window
- Objective removal criteria
Unmanaged deprecation creates permanent legacy support burden.
Clear timelines and contracts reduce drag.
MIGRATION RFC (COPYABLE TEMPLATE)
Use a short RFC for every non-trivial migration:
# Migration RFC: [System / Boundary]
## Problem
What risk, cost, or capability gap makes this migration necessary?
## Scope
- In scope:
- Out of scope:
## Migration Shape
- Type: data | API | runtime | platform
- Strategy: strangler | shadow | dual-run | phased cutover
## Success Criteria
- Technical:
- User / business:
## Parity Checks
- What old vs new behavior must match?
- How will parity be measured?
## Rollout Plan
- Wave 1:
- Wave 2:
- Wave 3:
## Rollback Plan
- Trigger:
- Action:
- Data caveats:
## Consumer Plan
- Who is affected?
- Communication channel:
- Sunset date:
## Owners
- Migration owner:
- Approver:
Field rule: if the migration cannot be summarized in this shape, the team does not understand it yet.
PARITY & BACKFILL CHECKLIST
Before each migration stage, verify:
- Old and new paths are both observable
- Backfill runs in deterministic batches
- Reconciliation query is defined before data moves
- A mismatch threshold is defined for stop-the-line behavior
- Roll-forward and rollback are both possible for this stage
Example parity table:
| Check | Source | Target | Pass condition |
|---|---|---|---|
| Billing total | Legacy service | New service | 100% match on sample and shadow traffic |
| Discount rules | Legacy service | New service | No mismatch above agreed threshold |
| Event emission | Legacy queue | New queue | Same count and key fields |
DECOMMISSION CHECKLIST
Legacy systems should not linger because the team got tired near the end.
## Decommission Checklist
- [ ] 100% traffic on new path for agreed window
- [ ] No critical incidents during stabilization window
- [ ] Consumer compatibility window closed
- [ ] Sunset communication sent
- [ ] Dashboards switched to new source of truth
- [ ] Runbooks updated
- [ ] Legacy writes disabled
- [ ] Legacy read path removed
- [ ] Cost / infra cleanup complete
Field rule: a migration is not complete when traffic moves. It is complete when the legacy path is safely removed.
War-Story Mini-Case: Big-Bang Migration Near Miss
Timeline:
T-21 days: Team plans one-shot cutover for billing service migration.T-6h: Pre-launch validation finds schema mismatch in discount-calculation path.T-4h: Risk review marks launchNo-Go; big-bang cutover canceled.Week 1: Migration reframed into strangler phases with5%traffic canary.Week 3: Shadow reads and parity checks detect two transformation defects before customer impact.Week 6: Traffic reaches100%on new service; legacy path enters deprecation window.
Key decisions:
- Rejected schedule pressure in favor of controlled rollout.
- Added parity gates as hard criteria for phase progression.
- Separated data-migration validation from API routing changes.
Outcome:
- Migration completed over six weeks with no customer-facing outage.
- Legacy component decommissioned after compatibility window closed.
OUTPUT ARTIFACT
Migration baseline artifact set:
- Migration RFC
- Milestone plan with gates
- Parity / reconciliation checklist
- Risk register
- Consumer communication plan
- Decommission checklist
These artifacts make multi-quarter migration work tractable, auditable, and survivable.