Chapter 25: Advanced Context Engineering
Learning Objectives
By the end of this chapter, you will be able to:
- Design a layered context architecture (constitution → domain → feature → task) for large projects
- Explain context inheritance and composition and how rules cascade through the project
- Distinguish short-term, medium-term, and long-term memory systems
- Apply the context quality formula: Relevance × Precision × Freshness
- Implement automated context loading: file-scoped rules, intelligent activation, on-demand skills
- Diagnose context issues when AI produces unexpected results
- Apply token optimization: summarization, progressive detail loading, context pruning
- Design a context architecture for a running project through a hands-on tutorial
- Implement multi-agent context sharing for related tasks
Building on Chapter 8 Fundamentals
Chapter 8 introduced context engineering fundamentals: the 4Ds (Delegation, Description, Discernment, Diligence), token management, the lost-in-the-middle effect, context isolation, and the SDD context stack. This chapter extends those foundations for large projects where context management becomes critical.
Key concepts from Chapter 8 that we build on:
- Context is everything the AI knows when processing an instruction
- Token budget must be managed—not all context fits
- Lost in the middle: Important information should be at start or end of context
- Context prioritization: Always include spec, constraints, contracts; load patterns on demand
- Checkpoint documents enable multi-session workflows
This chapter adds: layered architecture, memory systems, automated loading, context debugging, token optimization, and multi-agent sharing.
Context Architecture for Large Projects
As projects grow, ad-hoc context loading fails. You need a context architecture—a deliberate structure for what context exists, when it loads, and how it combines.
Layered Context
Context flows in layers from broad to narrow:
Layer 1: Constitution (project principles, always relevant)
↓
Layer 2: Domain (domain rules, conventions)
↓
Layer 3: Feature (current feature spec, plan, contracts)
↓
Layer 4: Task (current task, acceptance criteria, related code)
Layer 1 — Constitution: Project identity. Principles, stack, constraints. Loaded for any implementation task. Small (typically 500–2000 tokens).
Layer 2 — Domain: Domain-specific rules. E.g., "All payment flows must be idempotent." Loaded when working in that domain. Medium (500–3000 tokens).
Layer 3 — Feature: Current feature specification, plan, data model, contracts. Loaded when implementing that feature. Large (2000–10000 tokens).
Layer 4 — Task: Current task, acceptance criterion, files being modified. Loaded for the immediate work. Focused (500–2000 tokens).
Context Inheritance
Layers inherit from above. Lower layers assume upper layers are in effect.
Example:
- Constitution: "No secrets in code"
- Domain (payments): "All payment operations must be idempotent"
- Feature (refund): "Refund must complete within 30 seconds"
- Task: "Implement refund endpoint"
The task inherits: no secrets, idempotency, 30-second requirement. The AI doesn't need to re-read the constitution for each task—it's assumed. But when context is trimmed, upper layers may be dropped. Explicit inheritance means: if you load only the task, you may miss constitution rules. Design loading so critical constraints are always present (e.g., constitution in system prompt or always-loaded rule).
Context Composition
Multiple context sources combine. The order and priority matter.
Composition rules:
- Later overrides earlier for conflicting instructions (e.g., feature spec overrides generic convention when both say different things)
- More specific overrides more general (task over feature over domain over constitution for that task's scope)
- Additive for non-conflicting (constitution + domain + feature + task = full picture)
Example:
- Constitution: "Use REST, JSON"
- Feature: "This feature uses GraphQL" (exception)
- Result: This feature uses GraphQL. Constitution's REST applies to other features.
Memory Systems
Context has different lifetimes. Organize memory by duration.
Short-Term Memory
Duration: Single session (one conversation)
Contents: Conversation history, loaded files, current task state
Characteristics: Volatile. Lost when session ends. High relevance for current task.
Management:
- Keep conversation focused; avoid tangents
- Summarize long threads when approaching token limit
- Use "checkpoint and restart" for very long sessions
Medium-Term Memory
Duration: Across sessions, within a feature or sprint
Contents: Checkpoint documents, session summaries, feature state
Characteristics: Persists in files. Loaded at session start. Bridges sessions.
Management:
- Create checkpoint at end of significant sessions
- Store in
memory/checkpoints/orspecs/[feature]/checkpoint.md - Load checkpoint at start of continuation session
Example: memory/checkpoints/004-auth-session-2.md
# Checkpoint: Auth Feature — Session 2
## Previous Session Summary
- Implemented registration, login, JWT
- Decided: refresh tokens in Redis, 7-day TTL
- 12 tests passing
## This Session
- Goal: Password reset flow
- Context: spec.md, plan.md, contracts/reset.yaml
- Blockers: None
## Next Session
- Token refresh endpoint
- Session list (user can see active sessions)
Long-Term Memory
Duration: Permanent (project lifetime)
Contents: Constitution, ADRs, skills, PHRs, templates
Characteristics: Persists. Evolves slowly. High authority.
Management:
- Keep constitution concise; update rarely
- ADRs are append-only (add new, don't edit old)
- Skills and PHRs evolve; version if significant
Location: memory/constitution.md, memory/adr/, agents/skills/, memory/phr/
Memory Summary
| Type | Duration | Example | Location |
|---|---|---|---|
| Short-term | Session | Conversation | In-memory |
| Medium-term | Sessions | Checkpoint | memory/checkpoints/ |
| Long-term | Permanent | Constitution, ADR | memory/, agents/ |
The Context Quality Formula
Context quality determines output quality. Use this formula to evaluate:
Quality = Relevance × Precision × Freshness
Relevance
Does the context apply to the current task?
High relevance: Spec for the feature you're implementing, contract for the endpoint you're building
Low relevance: Spec for a different feature, documentation for a different module
Action: Load only what's relevant. Exclude tangentially related content.
Precision
Is the context specific enough to constrain output?
High precision: "Return 201 with Location header and body matching schema" (exact)
Low precision: "Return success" (vague)
Action: Prefer concrete, testable instructions. Avoid vague guidance.
Freshness
Is the context current?
Fresh: Matches current codebase, reflects latest decisions
Stale: Documents outdated patterns, references removed code
Action: Update context when code changes. Prune stale docs. Date ADRs and checkpoints.
Applying the Formula
When AI produces poor output, diagnose:
- Relevance: Was the right spec loaded? Were irrelevant specs loaded (noise)?
- Precision: Were acceptance criteria specific enough? Were constraints clear?
- Freshness: Was the spec updated? Does the contract match the code?
Automated Context Loading Strategies
Manual context loading doesn't scale. Automate based on context.
File-Scoped Rules (Glob Patterns)
Cursor rules (and similar) can activate based on file patterns.
Example: .cursor/rules/api-routes.mdc
---
description: API route implementation rules
globs: ["**/routes/**/*.ts", "**/api/**/*.ts"]
---
When implementing API routes:
- Follow contracts in specs/**/contracts/
- Use error format from memory/constitution.md
- Validate inputs at boundary
- No business logic in route handler
When you open a file matching routes/**/*.ts or api/**/*.ts, this rule activates. No manual loading.
Intelligent Activation from Task Analysis
Some tools analyze the task and load relevant context:
- Task detection: "User is implementing POST /users" → load user spec, contracts, data model
- Dependency detection: "Task touches auth" → load auth constraints, security spec
- Pattern matching: "Task is 'add integration test'" → load testing skill, existing test examples
Implementation: Custom scripts, agent logic, or MCP tools that infer context from task description.
On-Demand Skills
Skills load when relevant, not always.
Example: A "payment idempotency" skill loads only when the task involves payments.
How:
- Task description contains "payment" or "charge" → load payment skill
- File path contains "payments/" → load payment skill
- Spec references "idempotency" → load idempotency skill
Benefit: Saves tokens. Avoids loading irrelevant skills.
Context Loading Checklist
- Constitution always loaded (system or first rule)
- File-scoped rules for common paths (routes, tests, entities)
- Feature spec loaded when implementing that feature
- Contracts and data model loaded with spec
- Skills loaded on-demand by task type
- Checkpoints loaded when continuing a session
Context Debugging
When AI produces unexpected results, the cause is often context—not model capability.
Step 1: Verify What Was Loaded
Check: What files were open? What rules activated? What did the prompt include?
Common issues:
- Wrong spec loaded (different feature)
- Old spec loaded (stale)
- No spec loaded (AI guessed)
- Conflicting rules (two rules say opposite things)
Step 2: Check Relevance
Question: Was the loaded context relevant to the task?
Example: AI implemented REST when you wanted GraphQL. Check: Was the API convention loaded? Did it say REST? Was there a feature override?
Step 3: Check Precision
Question: Were the instructions specific enough?
Example: AI returned 200 instead of 201 for creation. Check: Did the contract say 201? Was it in the acceptance criteria?
Step 4: Check Freshness
Question: Is the context current?
Example: AI used deprecated API. Check: When was the spec last updated? Does the contract match the current code?
Step 5: Check Lost in the Middle
Question: Was critical information buried in the middle?
Example: AI missed a constraint. Check: Where was the constraint in the context? If it was in the middle of a long spec, move it to the top or summarize at the start.
Context Debugging Checklist
- List all loaded context (files, rules, prompt)
- Verify relevance (right feature? right task?)
- Verify precision (specific enough?)
- Verify freshness (up to date?)
- Check position (lost in middle?)
- Check conflicts (contradicting rules?)
Token Optimization Techniques
When context exceeds the token budget, optimize.
Summarization
Replace long documents with summaries:
Before: 5000-token spec
After: 500-token summary + "Full spec available at specs/004/spec.md"
Summary structure:
- Problem (1–2 sentences)
- Key requirements (bulleted)
- Acceptance criteria (bulleted)
- Critical constraints (bulleted)
- Link to full spec
When to use: When the full spec is too long for the context window. Load summary for overview; load full spec for specific sections on demand.
Progressive Detail Loading and Context Isolation
Load overview first; load detail on demand. Aggressively limit context window bloat to prevent agent hallucination.
Level 1: Project overview (200 tokens) Level 2: Feature summary (300 tokens) Level 3: Full spec (2000 tokens) Level 4: Full spec + plan + contracts (5000 tokens)
Strategy: Start with Level 1–2. If AI needs more, load Level 3. If still insufficient, Level 4.
Implementation:
- Use
.cursorignoreor.rooignoreto hide generated files, build artifacts, and irrelevant directories from the agent's broad search. - Use explicit
@mentions (e.g.,@src/auth/or@spec.md) to pull exactly what is needed into the context window, leaving the rest out. - "Here's the overview. Implement the registration endpoint. If you need more detail, ask."
- Or: Load summary in prompt; load full spec only when AI requests it (multi-turn).
Context Pruning
Remove stale or irrelevant information.
Prune:
- Old conversation turns (keep last N)
- Outdated spec sections (marked deprecated)
- Redundant context (same info in multiple places)
- Irrelevant features (not related to current task)
Tool:
- Manual: Trim conversation, close old tabs
- Automated: Script that keeps only recent + relevant context
Token Budget Allocation
Allocate tokens deliberately:
| Category | Budget | Purpose |
|---|---|---|
| System prompt | 5% | Identity, persona |
| Rules | 10% | Coding standards, conventions |
| Constitution | 2% | Project principles |
| Current spec | 15% | Feature context |
| Contracts + data model | 10% | API and data |
| Current task | 5% | Immediate work |
| Code examples | 15% | Patterns to follow |
| Conversation | 35% | History + response |
| Buffer | 3% | Slack |
Adjust based on project size. Large specs may need 20%; small specs 10%.
Tutorial: Design a Context Architecture for the Running Project
This tutorial walks you through designing a context architecture for a project (use TaskFlow or your own).
Prerequisites
- A project with specs, memory, and some implementation
- Familiarity with .cursor/rules or similar
Step 1: Map Current Context Sources
List all context sources:
- Constitution: memory/constitution.md
- Specs: specs/features/*/spec.md, plan.md, etc.
- Constraints: specs/constraints/*
- Rules: .cursor/rules/*
- ADRs: memory/adr/*
- Skills: agents/skills/*
- Checkpoints: memory/checkpoints/*
For each, estimate token count (rough: 1 token ≈ 4 characters).
Step 2: Define Context Layers
Assign each source to a layer:
| Layer | Sources | Load When |
|---|---|---|
| 1. Constitution | constitution.md | Always |
| 2. Domain | constraints/security.md, api-conventions.md | When implementing |
| 3. Feature | spec.md, plan.md, contracts/, data-model.md | When implementing that feature |
| 4. Task | tasks.md (current task), related files | When implementing that task |
Step 3: Create File-Scoped Rules
Create rules that activate by path:
Example: .cursor/rules/api.mdc
---
description: API implementation
globs: ["src/**/routes/**", "src/**/api/**"]
---
- Load: specs/constraints/api-conventions.md
- Load: memory/constitution.md (error format)
- Follow: contracts in specs/**/contracts/
Example: .cursor/rules/tests.mdc
---
description: Test implementation
globs: ["tests/**"]
---
- Load: agents/skills/testing/SKILL.md (if exists)
- Follow: Arrange-Act-Assert
- Match: acceptance criteria from spec
Step 4: Create Loading Rules
Document when to load what:
- Starting implementation: Load constitution + feature spec + plan + contracts
- Implementing task: Load task + acceptance criterion + relevant files
- Continuing session: Load checkpoint + current task
- Debugging: Load spec + contract + recent code
Step 5: Test Context Quality
Pick an implementation task. Load context per your rules. Ask AI to implement. Evaluate:
- Relevance: Did AI use the right spec? (Yes/No)
- Precision: Did output match acceptance criteria? (Yes/No)
- Freshness: Did AI use current patterns? (Yes/No)
If any No, refine loading. Add missing context or improve precision of instructions.
Step 6: Create Summaries (If Needed)
If specs are long, create summaries:
specs/004/spec-summary.md— 300–500 token summary- Load summary by default; full spec on demand
Step 7: Document the Architecture
Create memory/context-architecture.md:
# Context Architecture
## Layers
1. Constitution: always
2. Domain: constraints by path
3. Feature: spec + plan + contracts when implementing
4. Task: task + acceptance + files when implementing
## File-Scoped Rules
- api.mdc: routes, api/
- tests.mdc: tests/
## Loading Rules
- Start: constitution + feature
- Task: task + acceptance + files
- Continue: checkpoint + task
## Summaries
- specs/004/spec-summary.md (when full spec too long)
Multi-Agent Context Sharing
When multiple agents work on related tasks, context must be shared effectively.
The Challenge
- Agent A implements feature X
- Agent B implements feature Y (depends on X)
- Agent B needs to know: What did A implement? What contracts? What patterns?
Strategies
1. Shared Memory
All agents read from the same memory/ and specs/. No duplication. Agent B loads:
- Agent A's feature spec (if Y depends on X)
- Shared contracts
- Constitution
- ADRs
Benefit: Single source of truth. Risk: Agent B may load too much (noise).
2. Handoff Documents
Agent A produces a handoff document for Agent B:
# Handoff: Feature X → Feature Y
## What Agent A Implemented
- POST /users, GET /users/:id
- User entity, UserRepository
- Auth middleware
## Contracts
- See specs/001/users/contracts/users-api.yaml
## Patterns Used
- Validation at boundary
- Error format consistent with constitution
## What Agent B Needs
- Feature Y depends on User (existing)
- Use UserRepository for user lookup
- Follow same validation pattern
Agent B loads the handoff. Focused context. No need to read Agent A's full spec.
3. Checkpoint Chain
- Agent A: Completes work → checkpoint
- Agent B: Loads checkpoint → continues → checkpoint
- Agent C: Loads checkpoint → continues
Each checkpoint summarizes state for the next agent. Reduces context size.
4. Contract as Interface
- Agent A implements to contract
- Agent B consumes via contract
- Contract is the shared context. Agent B doesn't need Agent A's implementation details.
Best practice: Contracts are the primary handoff. Implementation details stay with Agent A.
Multi-Agent Context Checklist
- Shared memory (specs, memory, constitution)
- Handoff documents for complex dependencies
- Checkpoints when passing between agents
- Contracts as interface (minimal shared context)
- Avoid loading full implementation of other agents
Advanced: Context Composition in Practice
Example: Implementing a New Endpoint
Context to load (in order):
- Constitution (always): Error format, no secrets, validation
- API conventions (constraints): REST patterns, pagination
- Feature spec (004): User registration requirements
- Contract (004): POST /register schema
- Data model (004): User entity
- Current task (T-003): "Implement POST /register"
- Existing pattern (optional): Similar endpoint (e.g., POST /login) for consistency
Total: ~3000–5000 tokens depending on spec size.
Composition: Constitution + conventions + feature + contract + task. No conflicts. Additive.
Example: Conflict Resolution
Constitution: "Use 4 spaces for indentation"
Feature spec: "This project uses 2 spaces (legacy)"
Resolution: Feature overrides constitution for this project. Document in spec or ADR. Add to rules: "This project uses 2 spaces (override: see spec)."
Try With AI
Prompt 1: Context Architecture Audit
"I'm implementing features in a project with specs/, memory/, and agents/. Help me design a context architecture: (1) What are the layers? (2) What should load when? (3) What file-scoped rules would help? (4) How do we handle token limits? Create a concrete plan for my project structure."
Prompt 2: Context Debugging
"AI produced [describe wrong output]. I had loaded [list what was loaded]. Help me diagnose: Was it relevance (wrong context?), precision (vague instructions?), or freshness (stale docs?)? What should I change for next time?"
Prompt 3: Create Spec Summary
"Take this spec [paste or link]. Create a 300–500 token summary that captures: problem, key requirements, acceptance criteria, critical constraints. Include a link to the full spec. Format for use as context when the full spec is too long."
Prompt 4: Multi-Agent Handoff
"Agent A implemented [feature X]. Agent B needs to implement [feature Y] which depends on X. What should the handoff document contain? Draft a handoff document that gives Agent B everything needed without loading Agent A's full context."
Practice Exercises
Exercise 1: Design Context Architecture
For your current project (or TaskFlow), design a context architecture. Create: (1) Layer map (what goes in each layer), (2) Load rules (when to load what), (3) File-scoped rules for at least 2 path patterns. Document in memory/context-architecture.md. Test with one implementation task: Does the AI get the right context?
Expected outcome: context-architecture.md and a brief test report.
Exercise 2: Context Debugging
Recall a time when AI produced wrong or unexpected output. Retrospectively apply the context debugging checklist: What was loaded? Relevance? Precision? Freshness? Lost in middle? Write a 1-page diagnosis and 3 concrete changes to prevent recurrence.
Expected outcome: Diagnosis document with 3 preventive actions.
Exercise 3: Token Optimization
Take a project with a large spec (or use a sample). Create a 300–500 token summary. Compare: (1) Implement a task with full spec loaded, (2) Implement same task with summary only. What was lost? What was sufficient? When would you use summary vs. full?
Expected outcome: Summary + comparison report.
Key Takeaways
-
Layered context: Constitution → domain → feature → task. Layers inherit from above. More specific overrides more general. Design loading so each layer has a clear trigger.
-
Memory systems: Short-term (session), medium-term (checkpoints), long-term (constitution, ADRs, skills). Each has different duration and management. Use checkpoints to bridge sessions.
-
Context quality formula: Relevance × Precision × Freshness. When output is wrong, diagnose: Was the right context loaded? Was it specific enough? Was it current?
-
Automated loading: File-scoped rules (globs), intelligent activation (task analysis), on-demand skills. Reduces manual context management and improves consistency.
-
Token optimization: Summarization (replace long docs with summaries), progressive detail (overview first, detail on demand), context pruning (remove stale, redundant). Allocate token budget deliberately.
-
Multi-agent context: Shared memory, handoff documents, checkpoint chains, contracts as interface. Minimize what each agent must load; use contracts as the primary handoff.
Chapter Quiz
-
What are the four layers of the layered context architecture? What does each contain?
-
What is context inheritance? How does it affect how you load context?
-
What are the three memory systems (short-term, medium-term, long-term)? What is an example of each? Where do they live?
-
What is the context quality formula? How do you use it when diagnosing poor AI output?
-
What are three automated context loading strategies? Give an example of each.
-
When AI produces unexpected results, what five steps should you take to debug context?
-
What are three token optimization techniques? When would you use each?
-
How can multiple agents share context effectively? What are two strategies and when would you use each?
Appendix: Context Engineering Anti-Patterns
Anti-Pattern 1: Loading Everything
Symptom: Every session loads the entire specs/ directory, all ADRs, and full conversation history.
Problem: Token overflow, lost in the middle, slow responses.
Fix: Load only what's relevant. Use file-scoped rules and on-demand loading.
Anti-Pattern 2: No Checkpoints
Symptom: Long sessions with no intermediate saves. Context lost on disconnect.
Problem: Work lost; must re-explain everything in new session.
Fix: Create checkpoints at natural break points (task complete, phase complete). Load checkpoint at session start.
Anti-Pattern 3: Stale Context
Symptom: Spec says one thing, code does another. AI follows old spec.
Problem: Wasted effort; rework; inconsistent output.
Fix: Update specs when code changes. Date documents. Prune deprecated content. Verify freshness before implementation.
Anti-Pattern 4: Vague Instructions
Symptom: "Implement the feature" or "Make it work."
Problem: AI guesses. Output varies. Acceptance unclear.
Fix: Use specific acceptance criteria. Reference contracts. Include examples. Apply the precision dimension of the quality formula.
Anti-Pattern 5: Ignoring Lost in the Middle
Symptom: Critical constraint buried in page 5 of a long spec. AI misses it.
Problem: Output violates constraint. Rework required.
Fix: Put critical info at start or end. Summarize key constraints at top. Use progressive detail: overview first, detail on demand.
Summary: Advanced Context Engineering at a Glance
| Topic | Key Idea |
|---|---|
| Layered context | Constitution → domain → feature → task. Load by relevance. |
| Memory systems | Short (session), medium (checkpoints), long (constitution, ADRs) |
| Quality formula | Relevance × Precision × Freshness |
| Automated loading | File-scoped rules, task-based activation, on-demand skills |
| Debugging | Verify loaded context; check relevance, precision, freshness; check lost in middle |
| Token optimization | Summarize, progressive detail, prune |
| Multi-agent | Shared memory, handoffs, checkpoints, contracts as interface |