Skip to main content

Chapter 25: Advanced Context Engineering


Learning Objectives

By the end of this chapter, you will be able to:

  • Design a layered context architecture (constitution → domain → feature → task) for large projects
  • Explain context inheritance and composition and how rules cascade through the project
  • Distinguish short-term, medium-term, and long-term memory systems
  • Apply the context quality formula: Relevance × Precision × Freshness
  • Implement automated context loading: file-scoped rules, intelligent activation, on-demand skills
  • Diagnose context issues when AI produces unexpected results
  • Apply token optimization: summarization, progressive detail loading, context pruning
  • Design a context architecture for a running project through a hands-on tutorial
  • Implement multi-agent context sharing for related tasks

Building on Chapter 8 Fundamentals

Chapter 8 introduced context engineering fundamentals: the 4Ds (Delegation, Description, Discernment, Diligence), token management, the lost-in-the-middle effect, context isolation, and the SDD context stack. This chapter extends those foundations for large projects where context management becomes critical.

Key concepts from Chapter 8 that we build on:

  • Context is everything the AI knows when processing an instruction
  • Token budget must be managed—not all context fits
  • Lost in the middle: Important information should be at start or end of context
  • Context prioritization: Always include spec, constraints, contracts; load patterns on demand
  • Checkpoint documents enable multi-session workflows

This chapter adds: layered architecture, memory systems, automated loading, context debugging, token optimization, and multi-agent sharing.


Context Architecture for Large Projects

As projects grow, ad-hoc context loading fails. You need a context architecture—a deliberate structure for what context exists, when it loads, and how it combines.

Layered Context

Context flows in layers from broad to narrow:

Layer 1: Constitution (project principles, always relevant)

Layer 2: Domain (domain rules, conventions)

Layer 3: Feature (current feature spec, plan, contracts)

Layer 4: Task (current task, acceptance criteria, related code)

Layer 1 — Constitution: Project identity. Principles, stack, constraints. Loaded for any implementation task. Small (typically 500–2000 tokens).

Layer 2 — Domain: Domain-specific rules. E.g., "All payment flows must be idempotent." Loaded when working in that domain. Medium (500–3000 tokens).

Layer 3 — Feature: Current feature specification, plan, data model, contracts. Loaded when implementing that feature. Large (2000–10000 tokens).

Layer 4 — Task: Current task, acceptance criterion, files being modified. Loaded for the immediate work. Focused (500–2000 tokens).

Context Inheritance

Layers inherit from above. Lower layers assume upper layers are in effect.

Example:

  • Constitution: "No secrets in code"
  • Domain (payments): "All payment operations must be idempotent"
  • Feature (refund): "Refund must complete within 30 seconds"
  • Task: "Implement refund endpoint"

The task inherits: no secrets, idempotency, 30-second requirement. The AI doesn't need to re-read the constitution for each task—it's assumed. But when context is trimmed, upper layers may be dropped. Explicit inheritance means: if you load only the task, you may miss constitution rules. Design loading so critical constraints are always present (e.g., constitution in system prompt or always-loaded rule).

Context Composition

Multiple context sources combine. The order and priority matter.

Composition rules:

  1. Later overrides earlier for conflicting instructions (e.g., feature spec overrides generic convention when both say different things)
  2. More specific overrides more general (task over feature over domain over constitution for that task's scope)
  3. Additive for non-conflicting (constitution + domain + feature + task = full picture)

Example:

  • Constitution: "Use REST, JSON"
  • Feature: "This feature uses GraphQL" (exception)
  • Result: This feature uses GraphQL. Constitution's REST applies to other features.

Memory Systems

Context has different lifetimes. Organize memory by duration.

Short-Term Memory

Duration: Single session (one conversation)

Contents: Conversation history, loaded files, current task state

Characteristics: Volatile. Lost when session ends. High relevance for current task.

Management:

  • Keep conversation focused; avoid tangents
  • Summarize long threads when approaching token limit
  • Use "checkpoint and restart" for very long sessions

Medium-Term Memory

Duration: Across sessions, within a feature or sprint

Contents: Checkpoint documents, session summaries, feature state

Characteristics: Persists in files. Loaded at session start. Bridges sessions.

Management:

  • Create checkpoint at end of significant sessions
  • Store in memory/checkpoints/ or specs/[feature]/checkpoint.md
  • Load checkpoint at start of continuation session

Example: memory/checkpoints/004-auth-session-2.md

# Checkpoint: Auth Feature — Session 2

## Previous Session Summary
- Implemented registration, login, JWT
- Decided: refresh tokens in Redis, 7-day TTL
- 12 tests passing

## This Session
- Goal: Password reset flow
- Context: spec.md, plan.md, contracts/reset.yaml
- Blockers: None

## Next Session
- Token refresh endpoint
- Session list (user can see active sessions)

Long-Term Memory

Duration: Permanent (project lifetime)

Contents: Constitution, ADRs, skills, PHRs, templates

Characteristics: Persists. Evolves slowly. High authority.

Management:

  • Keep constitution concise; update rarely
  • ADRs are append-only (add new, don't edit old)
  • Skills and PHRs evolve; version if significant

Location: memory/constitution.md, memory/adr/, agents/skills/, memory/phr/

Memory Summary

TypeDurationExampleLocation
Short-termSessionConversationIn-memory
Medium-termSessionsCheckpointmemory/checkpoints/
Long-termPermanentConstitution, ADRmemory/, agents/

The Context Quality Formula

Context quality determines output quality. Use this formula to evaluate:

Quality = Relevance × Precision × Freshness

Relevance

Does the context apply to the current task?

High relevance: Spec for the feature you're implementing, contract for the endpoint you're building

Low relevance: Spec for a different feature, documentation for a different module

Action: Load only what's relevant. Exclude tangentially related content.

Precision

Is the context specific enough to constrain output?

High precision: "Return 201 with Location header and body matching schema" (exact)

Low precision: "Return success" (vague)

Action: Prefer concrete, testable instructions. Avoid vague guidance.

Freshness

Is the context current?

Fresh: Matches current codebase, reflects latest decisions

Stale: Documents outdated patterns, references removed code

Action: Update context when code changes. Prune stale docs. Date ADRs and checkpoints.

Applying the Formula

When AI produces poor output, diagnose:

  1. Relevance: Was the right spec loaded? Were irrelevant specs loaded (noise)?
  2. Precision: Were acceptance criteria specific enough? Were constraints clear?
  3. Freshness: Was the spec updated? Does the contract match the code?

Automated Context Loading Strategies

Manual context loading doesn't scale. Automate based on context.

File-Scoped Rules (Glob Patterns)

Cursor rules (and similar) can activate based on file patterns.

Example: .cursor/rules/api-routes.mdc

---
description: API route implementation rules
globs: ["**/routes/**/*.ts", "**/api/**/*.ts"]
---

When implementing API routes:
- Follow contracts in specs/**/contracts/
- Use error format from memory/constitution.md
- Validate inputs at boundary
- No business logic in route handler

When you open a file matching routes/**/*.ts or api/**/*.ts, this rule activates. No manual loading.

Intelligent Activation from Task Analysis

Some tools analyze the task and load relevant context:

  1. Task detection: "User is implementing POST /users" → load user spec, contracts, data model
  2. Dependency detection: "Task touches auth" → load auth constraints, security spec
  3. Pattern matching: "Task is 'add integration test'" → load testing skill, existing test examples

Implementation: Custom scripts, agent logic, or MCP tools that infer context from task description.

On-Demand Skills

Skills load when relevant, not always.

Example: A "payment idempotency" skill loads only when the task involves payments.

How:

  • Task description contains "payment" or "charge" → load payment skill
  • File path contains "payments/" → load payment skill
  • Spec references "idempotency" → load idempotency skill

Benefit: Saves tokens. Avoids loading irrelevant skills.

Context Loading Checklist

  • Constitution always loaded (system or first rule)
  • File-scoped rules for common paths (routes, tests, entities)
  • Feature spec loaded when implementing that feature
  • Contracts and data model loaded with spec
  • Skills loaded on-demand by task type
  • Checkpoints loaded when continuing a session

Context Debugging

When AI produces unexpected results, the cause is often context—not model capability.

Step 1: Verify What Was Loaded

Check: What files were open? What rules activated? What did the prompt include?

Common issues:

  • Wrong spec loaded (different feature)
  • Old spec loaded (stale)
  • No spec loaded (AI guessed)
  • Conflicting rules (two rules say opposite things)

Step 2: Check Relevance

Question: Was the loaded context relevant to the task?

Example: AI implemented REST when you wanted GraphQL. Check: Was the API convention loaded? Did it say REST? Was there a feature override?

Step 3: Check Precision

Question: Were the instructions specific enough?

Example: AI returned 200 instead of 201 for creation. Check: Did the contract say 201? Was it in the acceptance criteria?

Step 4: Check Freshness

Question: Is the context current?

Example: AI used deprecated API. Check: When was the spec last updated? Does the contract match the current code?

Step 5: Check Lost in the Middle

Question: Was critical information buried in the middle?

Example: AI missed a constraint. Check: Where was the constraint in the context? If it was in the middle of a long spec, move it to the top or summarize at the start.

Context Debugging Checklist

  • List all loaded context (files, rules, prompt)
  • Verify relevance (right feature? right task?)
  • Verify precision (specific enough?)
  • Verify freshness (up to date?)
  • Check position (lost in middle?)
  • Check conflicts (contradicting rules?)

Token Optimization Techniques

When context exceeds the token budget, optimize.

Summarization

Replace long documents with summaries:

Before: 5000-token spec

After: 500-token summary + "Full spec available at specs/004/spec.md"

Summary structure:

  • Problem (1–2 sentences)
  • Key requirements (bulleted)
  • Acceptance criteria (bulleted)
  • Critical constraints (bulleted)
  • Link to full spec

When to use: When the full spec is too long for the context window. Load summary for overview; load full spec for specific sections on demand.

Progressive Detail Loading and Context Isolation

Load overview first; load detail on demand. Aggressively limit context window bloat to prevent agent hallucination.

Level 1: Project overview (200 tokens) Level 2: Feature summary (300 tokens) Level 3: Full spec (2000 tokens) Level 4: Full spec + plan + contracts (5000 tokens)

Strategy: Start with Level 1–2. If AI needs more, load Level 3. If still insufficient, Level 4.

Implementation:

  • Use .cursorignore or .rooignore to hide generated files, build artifacts, and irrelevant directories from the agent's broad search.
  • Use explicit @ mentions (e.g., @src/auth/ or @spec.md) to pull exactly what is needed into the context window, leaving the rest out.
  • "Here's the overview. Implement the registration endpoint. If you need more detail, ask."
  • Or: Load summary in prompt; load full spec only when AI requests it (multi-turn).

Context Pruning

Remove stale or irrelevant information.

Prune:

  • Old conversation turns (keep last N)
  • Outdated spec sections (marked deprecated)
  • Redundant context (same info in multiple places)
  • Irrelevant features (not related to current task)

Tool:

  • Manual: Trim conversation, close old tabs
  • Automated: Script that keeps only recent + relevant context

Token Budget Allocation

Allocate tokens deliberately:

CategoryBudgetPurpose
System prompt5%Identity, persona
Rules10%Coding standards, conventions
Constitution2%Project principles
Current spec15%Feature context
Contracts + data model10%API and data
Current task5%Immediate work
Code examples15%Patterns to follow
Conversation35%History + response
Buffer3%Slack

Adjust based on project size. Large specs may need 20%; small specs 10%.


Tutorial: Design a Context Architecture for the Running Project

This tutorial walks you through designing a context architecture for a project (use TaskFlow or your own).

Prerequisites

  • A project with specs, memory, and some implementation
  • Familiarity with .cursor/rules or similar

Step 1: Map Current Context Sources

List all context sources:

  1. Constitution: memory/constitution.md
  2. Specs: specs/features/*/spec.md, plan.md, etc.
  3. Constraints: specs/constraints/*
  4. Rules: .cursor/rules/*
  5. ADRs: memory/adr/*
  6. Skills: agents/skills/*
  7. Checkpoints: memory/checkpoints/*

For each, estimate token count (rough: 1 token ≈ 4 characters).

Step 2: Define Context Layers

Assign each source to a layer:

LayerSourcesLoad When
1. Constitutionconstitution.mdAlways
2. Domainconstraints/security.md, api-conventions.mdWhen implementing
3. Featurespec.md, plan.md, contracts/, data-model.mdWhen implementing that feature
4. Tasktasks.md (current task), related filesWhen implementing that task

Step 3: Create File-Scoped Rules

Create rules that activate by path:

Example: .cursor/rules/api.mdc

---
description: API implementation
globs: ["src/**/routes/**", "src/**/api/**"]
---

- Load: specs/constraints/api-conventions.md
- Load: memory/constitution.md (error format)
- Follow: contracts in specs/**/contracts/

Example: .cursor/rules/tests.mdc

---
description: Test implementation
globs: ["tests/**"]
---

- Load: agents/skills/testing/SKILL.md (if exists)
- Follow: Arrange-Act-Assert
- Match: acceptance criteria from spec

Step 4: Create Loading Rules

Document when to load what:

  • Starting implementation: Load constitution + feature spec + plan + contracts
  • Implementing task: Load task + acceptance criterion + relevant files
  • Continuing session: Load checkpoint + current task
  • Debugging: Load spec + contract + recent code

Step 5: Test Context Quality

Pick an implementation task. Load context per your rules. Ask AI to implement. Evaluate:

  1. Relevance: Did AI use the right spec? (Yes/No)
  2. Precision: Did output match acceptance criteria? (Yes/No)
  3. Freshness: Did AI use current patterns? (Yes/No)

If any No, refine loading. Add missing context or improve precision of instructions.

Step 6: Create Summaries (If Needed)

If specs are long, create summaries:

  • specs/004/spec-summary.md — 300–500 token summary
  • Load summary by default; full spec on demand

Step 7: Document the Architecture

Create memory/context-architecture.md:

# Context Architecture

## Layers
1. Constitution: always
2. Domain: constraints by path
3. Feature: spec + plan + contracts when implementing
4. Task: task + acceptance + files when implementing

## File-Scoped Rules
- api.mdc: routes, api/
- tests.mdc: tests/

## Loading Rules
- Start: constitution + feature
- Task: task + acceptance + files
- Continue: checkpoint + task

## Summaries
- specs/004/spec-summary.md (when full spec too long)

Multi-Agent Context Sharing

When multiple agents work on related tasks, context must be shared effectively.

The Challenge

  • Agent A implements feature X
  • Agent B implements feature Y (depends on X)
  • Agent B needs to know: What did A implement? What contracts? What patterns?

Strategies

1. Shared Memory

All agents read from the same memory/ and specs/. No duplication. Agent B loads:

  • Agent A's feature spec (if Y depends on X)
  • Shared contracts
  • Constitution
  • ADRs

Benefit: Single source of truth. Risk: Agent B may load too much (noise).

2. Handoff Documents

Agent A produces a handoff document for Agent B:

# Handoff: Feature X → Feature Y

## What Agent A Implemented
- POST /users, GET /users/:id
- User entity, UserRepository
- Auth middleware

## Contracts
- See specs/001/users/contracts/users-api.yaml

## Patterns Used
- Validation at boundary
- Error format consistent with constitution

## What Agent B Needs

- Feature Y depends on User (existing)
- Use UserRepository for user lookup
- Follow same validation pattern

Agent B loads the handoff. Focused context. No need to read Agent A's full spec.

3. Checkpoint Chain

  • Agent A: Completes work → checkpoint
  • Agent B: Loads checkpoint → continues → checkpoint
  • Agent C: Loads checkpoint → continues

Each checkpoint summarizes state for the next agent. Reduces context size.

4. Contract as Interface

  • Agent A implements to contract
  • Agent B consumes via contract
  • Contract is the shared context. Agent B doesn't need Agent A's implementation details.

Best practice: Contracts are the primary handoff. Implementation details stay with Agent A.

Multi-Agent Context Checklist

  • Shared memory (specs, memory, constitution)
  • Handoff documents for complex dependencies
  • Checkpoints when passing between agents
  • Contracts as interface (minimal shared context)
  • Avoid loading full implementation of other agents

Advanced: Context Composition in Practice

Example: Implementing a New Endpoint

Context to load (in order):

  1. Constitution (always): Error format, no secrets, validation
  2. API conventions (constraints): REST patterns, pagination
  3. Feature spec (004): User registration requirements
  4. Contract (004): POST /register schema
  5. Data model (004): User entity
  6. Current task (T-003): "Implement POST /register"
  7. Existing pattern (optional): Similar endpoint (e.g., POST /login) for consistency

Total: ~3000–5000 tokens depending on spec size.

Composition: Constitution + conventions + feature + contract + task. No conflicts. Additive.

Example: Conflict Resolution

Constitution: "Use 4 spaces for indentation"

Feature spec: "This project uses 2 spaces (legacy)"

Resolution: Feature overrides constitution for this project. Document in spec or ADR. Add to rules: "This project uses 2 spaces (override: see spec)."


Try With AI

Prompt 1: Context Architecture Audit

"I'm implementing features in a project with specs/, memory/, and agents/. Help me design a context architecture: (1) What are the layers? (2) What should load when? (3) What file-scoped rules would help? (4) How do we handle token limits? Create a concrete plan for my project structure."

Prompt 2: Context Debugging

"AI produced [describe wrong output]. I had loaded [list what was loaded]. Help me diagnose: Was it relevance (wrong context?), precision (vague instructions?), or freshness (stale docs?)? What should I change for next time?"

Prompt 3: Create Spec Summary

"Take this spec [paste or link]. Create a 300–500 token summary that captures: problem, key requirements, acceptance criteria, critical constraints. Include a link to the full spec. Format for use as context when the full spec is too long."

Prompt 4: Multi-Agent Handoff

"Agent A implemented [feature X]. Agent B needs to implement [feature Y] which depends on X. What should the handoff document contain? Draft a handoff document that gives Agent B everything needed without loading Agent A's full context."


Practice Exercises

Exercise 1: Design Context Architecture

For your current project (or TaskFlow), design a context architecture. Create: (1) Layer map (what goes in each layer), (2) Load rules (when to load what), (3) File-scoped rules for at least 2 path patterns. Document in memory/context-architecture.md. Test with one implementation task: Does the AI get the right context?

Expected outcome: context-architecture.md and a brief test report.

Exercise 2: Context Debugging

Recall a time when AI produced wrong or unexpected output. Retrospectively apply the context debugging checklist: What was loaded? Relevance? Precision? Freshness? Lost in middle? Write a 1-page diagnosis and 3 concrete changes to prevent recurrence.

Expected outcome: Diagnosis document with 3 preventive actions.

Exercise 3: Token Optimization

Take a project with a large spec (or use a sample). Create a 300–500 token summary. Compare: (1) Implement a task with full spec loaded, (2) Implement same task with summary only. What was lost? What was sufficient? When would you use summary vs. full?

Expected outcome: Summary + comparison report.


Key Takeaways

  1. Layered context: Constitution → domain → feature → task. Layers inherit from above. More specific overrides more general. Design loading so each layer has a clear trigger.

  2. Memory systems: Short-term (session), medium-term (checkpoints), long-term (constitution, ADRs, skills). Each has different duration and management. Use checkpoints to bridge sessions.

  3. Context quality formula: Relevance × Precision × Freshness. When output is wrong, diagnose: Was the right context loaded? Was it specific enough? Was it current?

  4. Automated loading: File-scoped rules (globs), intelligent activation (task analysis), on-demand skills. Reduces manual context management and improves consistency.

  5. Token optimization: Summarization (replace long docs with summaries), progressive detail (overview first, detail on demand), context pruning (remove stale, redundant). Allocate token budget deliberately.

  6. Multi-agent context: Shared memory, handoff documents, checkpoint chains, contracts as interface. Minimize what each agent must load; use contracts as the primary handoff.


Chapter Quiz

  1. What are the four layers of the layered context architecture? What does each contain?

  2. What is context inheritance? How does it affect how you load context?

  3. What are the three memory systems (short-term, medium-term, long-term)? What is an example of each? Where do they live?

  4. What is the context quality formula? How do you use it when diagnosing poor AI output?

  5. What are three automated context loading strategies? Give an example of each.

  6. When AI produces unexpected results, what five steps should you take to debug context?

  7. What are three token optimization techniques? When would you use each?

  8. How can multiple agents share context effectively? What are two strategies and when would you use each?


Appendix: Context Engineering Anti-Patterns

Anti-Pattern 1: Loading Everything

Symptom: Every session loads the entire specs/ directory, all ADRs, and full conversation history.

Problem: Token overflow, lost in the middle, slow responses.

Fix: Load only what's relevant. Use file-scoped rules and on-demand loading.

Anti-Pattern 2: No Checkpoints

Symptom: Long sessions with no intermediate saves. Context lost on disconnect.

Problem: Work lost; must re-explain everything in new session.

Fix: Create checkpoints at natural break points (task complete, phase complete). Load checkpoint at session start.

Anti-Pattern 3: Stale Context

Symptom: Spec says one thing, code does another. AI follows old spec.

Problem: Wasted effort; rework; inconsistent output.

Fix: Update specs when code changes. Date documents. Prune deprecated content. Verify freshness before implementation.

Anti-Pattern 4: Vague Instructions

Symptom: "Implement the feature" or "Make it work."

Problem: AI guesses. Output varies. Acceptance unclear.

Fix: Use specific acceptance criteria. Reference contracts. Include examples. Apply the precision dimension of the quality formula.

Anti-Pattern 5: Ignoring Lost in the Middle

Symptom: Critical constraint buried in page 5 of a long spec. AI misses it.

Problem: Output violates constraint. Rework required.

Fix: Put critical info at start or end. Summarize key constraints at top. Use progressive detail: overview first, detail on demand.


Summary: Advanced Context Engineering at a Glance

TopicKey Idea
Layered contextConstitution → domain → feature → task. Load by relevance.
Memory systemsShort (session), medium (checkpoints), long (constitution, ADRs)
Quality formulaRelevance × Precision × Freshness
Automated loadingFile-scoped rules, task-based activation, on-demand skills
DebuggingVerify loaded context; check relevance, precision, freshness; check lost in middle
Token optimizationSummarize, progressive detail, prune
Multi-agentShared memory, handoffs, checkpoints, contracts as interface