Skip to main content

Chapter 15: AI Agent Architecture


Learning Objectives

By the end of this chapter, you will be able to:

  • Distinguish AI coding agents from autocomplete and explain why agents are autonomous implementers
  • Identify and apply the five agent types in SDD: Specification, Architecture, Coding, Testing, and Review
  • Trace the agent loop: Read context → Plan → Execute → Validate → Iterate
  • Compare single-agent vs. multi-agent workflows and choose the appropriate approach
  • Understand agent capabilities: file editing, terminal access, web search, tool use
  • Explain how Cursor, Claude Code, Gemini CLI, and Codex operate as agents
  • Set up a multi-step agent workflow that takes a spec through planning to implementation
  • Recognize agent limitations: context window, hallucination, implicit assumptions
  • Select the right agent type for each task in the SDD pipeline

What Is an AI Coding Agent?

An AI coding agent is not autocomplete. Autocomplete suggests the next token, the next line, the next function. It is reactive and local—it responds to what you type without understanding the broader task. An agent is different: it is an autonomous implementer that reads context, plans actions, executes them, and validates results.

The Autocomplete vs. Agent Distinction

AspectAutocompleteAI Coding Agent
ScopeNext token, line, or blockEntire task or feature
ContextImmediate cursor positionFull project, specs, constraints
ActionsInsert textEdit files, run commands, search, reason
AutonomyNone—waits for inputPlans and executes multi-step workflows
ValidationNoneCan run tests, verify output
IterationSingle responseLoops until task complete

When you ask autocomplete to "add a login endpoint," it might suggest a function signature. When you ask an agent to "add a login endpoint per spec features/001-auth," it reads the specification, understands the requirements, creates the endpoint file, adds tests, runs them, and fixes failures. The agent owns the task from specification to working code.

Why This Matters for SDD

In Spec-Driven Development, the specification is the source of truth. The agent's job is to transform that specification into correct, compliant implementation. This requires:

  1. Reading the specification and related context (constraints, architecture, existing code)
  2. Planning the implementation (what files, what order, what approach)
  3. Executing the plan (creating and modifying files)
  4. Validating the result (running tests, checking constraints)
  5. Iterating when validation fails

An agent that only suggests code cannot complete this loop. It cannot run tests. It cannot verify that the implementation matches the spec. The autonomous agent is the engine that makes SDD practical.


Agent Types in Spec-Driven Development

SDD distinguishes five agent types, each specialized for a phase of the development pipeline. Understanding these types helps you configure agents appropriately and choose the right agent for each task.

1. Specification Agent

Role: Transforms user intent and requirements into structured specifications.

Inputs: User stories, feature descriptions, product requirements, clarifying questions.

Outputs: SDD-compliant specifications with Problem, User Journeys, Functional Requirements, Acceptance Criteria, Edge Cases, Constraints, Dependencies, Observability, Security.

Capabilities: Strong at structured writing, template adherence, requirement extraction. May ask clarifying questions. Does not write code.

When to use: Starting a new feature, refining a vague requirement, converting a PRD into a spec.

2. Architecture Agent

Role: Transforms specifications into implementation plans and architectural decisions.

Inputs: Specifications, project constitution, architecture constraints, existing codebase structure.

Outputs: Implementation plans, phase breakdowns, file structure, technology choices, dependency graphs.

Capabilities: Strong at system design, constraint compliance, layering. May research frameworks and patterns. Produces plans, not code.

When to use: Before implementation, when adding a complex feature, when refactoring architecture.

3. Coding Agent

Role: Transforms implementation plans and specifications into working code.

Inputs: Specifications, implementation plans, task lists, existing code, constraints.

Outputs: Source code, tests, configuration files, migrations.

Capabilities: File editing, terminal access (run tests, install packages), code generation. The workhorse of implementation.

When to use: Implementation phase, bug fixes, feature implementation.

4. Testing Agent

Role: Generates and maintains tests that validate specifications and implementation.

Inputs: Specifications, acceptance criteria, existing code, test framework conventions.

Outputs: Unit tests, integration tests, contract tests, E2E tests.

Capabilities: Test generation, test execution, coverage analysis. Understands Given/When/Then, test structure, mocking strategies.

When to use: Test-first implementation, adding tests for existing code, improving coverage.

5. Review Agent

Role: Validates code and plans against specifications and constraints.

Inputs: Code, specifications, constraints, coding standards.

Outputs: Review reports, violation lists, suggested fixes.

Capabilities: Static analysis, spec compliance checking, constraint violation detection. Does not modify code unless asked.

When to use: Pre-merge review, compliance audits, quality gates.


How Agents Read Specifications and Produce Code

The path from specification to code is not a single step. It is a pipeline:

Specification → Implementation Plan → Task List → Code + Tests

The Specification as Input

A well-written specification provides the agent with:

  • Problem Statement: Why the feature exists—helps the agent prioritize and make trade-offs
  • Functional Requirements: What to build—the primary implementation targets
  • Acceptance Criteria: Testable conditions—drives test generation
  • Edge Cases: Boundary conditions—ensures error handling
  • Constraints: What NOT to do—prevents violations
  • Dependencies: What to integrate with—guides imports and APIs

The agent does not "interpret" the specification in the sense of guessing. It uses the specification as a structured input. Ambiguities should be marked with [NEEDS CLARIFICATION] rather than resolved by assumption.

From Specification to Implementation Plan

The Architecture Agent (or a Coding Agent in planning mode) produces an implementation plan. The plan typically includes:

  1. Phase breakdown: Ordered steps (e.g., Phase 0: Contracts, Phase 1: Repository, Phase 2: Service, Phase 3: API)
  2. File creation order: Which files to create first (often tests before implementation)
  3. Technology choices: Explicit decisions (e.g., "use Express middleware for auth")
  4. Constraint compliance: How the plan satisfies architecture and constitutional constraints

From Plan to Code

The Coding Agent executes the plan. It creates files in the specified order, implements each component, and runs tests. When tests fail, it iterates—fixing the implementation until validation passes.

The Validation Loop

After producing code, the agent validates:

  1. Tests pass: Run the test suite
  2. Constraints satisfied: Check against architecture and security constraints
  3. Spec coverage: Verify all acceptance criteria have corresponding tests

If validation fails, the agent loops back: diagnose, fix, re-validate.


Single-Agent vs. Multi-Agent Workflows

Single-Agent Workflow

One agent handles the entire pipeline. You might say: "Implement feature X per spec features/001-x. Follow the constitution. Create the implementation plan first, then implement."

Pros:

  • Simpler—no handoff overhead
  • Consistent context—same agent sees everything
  • Fewer coordination failures

Cons:

  • Generalist—may be less expert at each phase
  • Context dilution—spec writing and coding require different focus
  • Single point of failure

When to use: Small features, straightforward implementations, when simplicity matters.

Multi-Agent Workflow

Different agents (or agent invocations with different configurations) handle different phases:

  1. Specification Agent produces the spec
  2. Architecture Agent produces the implementation plan
  3. Coding Agent implements
  4. Testing Agent adds or validates tests
  5. Review Agent validates against spec and constraints

Pros:

  • Specialization—each agent optimized for its phase
  • Clear separation of concerns
  • Easier to improve each phase independently

Cons:

  • Handoff overhead—context must be passed between agents
  • Potential for inconsistency—later agents may misinterpret earlier output
  • More complex orchestration

When to use: Large features, complex architecture, when quality gates matter.

Hybrid Approach

Many teams use a hybrid: one primary Coding Agent, with skills or rules that change its behavior by phase. For example, when working on a spec file, the agent applies spec-writing conventions; when working on source code, it applies coding conventions. The agent is the same, but its configuration shifts with context.


Agent Capabilities

Modern AI coding agents support a range of capabilities beyond text generation.

File Editing

Agents can create, read, and modify files. They use tools like read_file, search_replace, write to manipulate the codebase. They can work across multiple files in a single task—e.g., add an API endpoint, its service layer, and its tests.

Best practice: Ensure the agent has access to the files it needs. Use .cursorignore or equivalent to exclude large generated files, but include specification and constraint documents.

Terminal Access

Agents can run shell commands. This enables:

  • Running tests (npm test, pytest)
  • Installing dependencies (npm install, pip install)
  • Running linters (eslint, ruff)
  • Executing build scripts
  • Running the application

Best practice: Restrict terminal access in sensitive environments. Prefer read-only commands when possible. Use sandboxed execution for untrusted code.

Agents can search the web for documentation, error messages, and best practices. This helps when:

  • Using an unfamiliar library
  • Debugging an obscure error
  • Finding current API documentation
  • Researching implementation approaches

Best practice: Verify that search results are from authoritative sources. Agents may surface outdated or incorrect information.

Tool Use

Agents can invoke external tools: databases, APIs, linters, formatters. Some agents support custom tools defined by the user.

Best practice: Document available tools in AGENTS.md or skill files. Specify when each tool should be used.


How Major Platforms Operate as Agents

Cursor

Cursor integrates AI into the editor. The agent has access to:

  • Open files and project structure
  • Terminal (when enabled)
  • Web search (when enabled)
  • Rules from .cursor/rules/
  • Skills from .cursor/skills/ and ~/.cursor/skills/
  • AGENTS.md at project root

The agent operates in chat and in "Agent" mode, where it can autonomously execute multi-step tasks. Composer mode allows long-running implementation sessions.

Claude Code (Anthropic)

Claude Code runs in the terminal. It has:

  • File system access
  • Terminal execution
  • Web search
  • Skills from .claude/skills/
  • Project-specific configuration

It is designed for command-line workflows and can be invoked for specific tasks or run interactively.

Gemini CLI (Google)

Gemini CLI provides similar capabilities with Google's models. It supports:

  • File operations
  • Terminal commands
  • Skills and project configuration

Roo Code (formerly Cline)

Roo Code is the leading open-source autonomous IDE agent. It operates within the editor and relies heavily on a "Memory Bank" pattern. It has access to:

  • File system and workspace structure
  • Terminal execution
  • Browser automation for testing
  • .roo/ directory for Memory Banks and conventions

Roo Code's Memory Banks function similarly to project constitutions and constraints, maintaining persistent knowledge across sessions.

Codex

Codex and similar tools (e.g., GitHub Copilot Workspace) provide agent-like capabilities within development environments. Configuration varies by platform.

Common pattern: All major platforms support some form of skills, rules, or project configuration. The Agent Skills Standard (covered in Chapter 16) provides a portable format.


The Agent Loop: Read → Plan → Execute → Validate → Iterate

The agent loop is the core cycle of autonomous implementation:

┌─────────────────────────────────────────────────────────────┐
│ AGENT LOOP │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ READ │───▶│ PLAN │───▶│ EXECUTE │ │
│ │ Context │ │ Actions │ │ Actions │ │
│ └──────────┘ └──────────┘ └────┬─────┘ │
│ ▲ │ │
│ │ ▼ │
│ │ ┌──────────┐ │
│ │ │ VALIDATE │ │
│ │ │ Results │ │
│ │ └────┬─────┘ │
│ │ │ │
│ │ ┌───────┴───────┐ │
│ │ │ │ │
│ │ ▼ ▼ │
│ │ ┌──────────┐ ┌──────────┐ │
│ └──────────────│ ITERATE │ │ DONE │ │
│ │ (fix) │ │ │ │
│ └──────────┘ └──────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘

Read

The agent reads:

  • User prompt or task description
  • Relevant files (specifications, constraints, existing code)
  • Project structure
  • Loaded rules and skills

The quality of the read phase determines everything that follows. Incomplete context leads to wrong plans.

Plan

The agent produces a plan: what files to create or modify, in what order, with what approach. For complex tasks, the plan may be explicit (written out). For simpler tasks, it may be implicit (the agent "thinks" then acts).

Execute

The agent performs the plan: creates files, edits code, runs commands. Execution may be multi-step—create file A, then B, then run tests.

Validate

The agent checks the result: Do tests pass? Does the code match the spec? Are constraints satisfied?

Iterate

If validation fails, the agent diagnoses, fixes, and re-validates. The loop continues until the task is complete or the agent cannot proceed (at which point it may ask for human help).


Tutorial: Multi-Step Agent Workflow from Spec to Implementation

This tutorial walks you through setting up and running a multi-step agent workflow. You will take a specification through planning to implementation.

Prerequisites

  • Cursor (or another AI coding agent platform)
  • A project with a specification (or use the example below)
  • Terminal access enabled for the agent

Step 1: Create a Minimal Specification

Create specs/features/001-user-profile.md:

# Feature 001: User Profile Display

## Problem

Users need to see their profile information after logging in. Currently, the dashboard
shows generic content. Users want to see their name, email, and avatar.

- **Who**: Authenticated users
- **Pain**: No personalized experience
- **Impact**: Reduced engagement, support requests

## Functional Requirements

### FR-1: Display Profile Data
System displays user's display name, email, and profile avatar on the dashboard.
Data comes from the authenticated user's session.

### FR-2: Avatar Fallback
If no avatar URL is set, display initials (e.g., "JD" for John Doe) in a circle.

## Acceptance Criteria

- **AC-1**: Given an authenticated user, When they view the dashboard, Then they see their
display name and email
- **AC-2**: Given a user with an avatar, When they view the dashboard, Then they see their
avatar image
- **AC-3**: Given a user without an avatar, When they view the dashboard, Then they see
initials in a styled circle

## Constraints

- Use existing auth middleware
- No new API endpoints (use session data)
- Maximum one new component

Step 2: Create AGENTS.md (If Not Present)

Create or update AGENTS.md at project root:

# Project AI Guidelines

## Conventions

- Follow existing code structure
- Use TypeScript for frontend
- Components in `src/components/`
- Tests colocated with source

## Implementation

- Write tests before implementation (TDD)
- Keep components under 200 lines
- Use existing design system components

Step 3: Prompt the Agent for Implementation Plan

Open the agent and provide:

"Read the specification at specs/features/001-user-profile.md. Create an implementation plan before writing any code. The plan should include: (1) files to create or modify, (2) order of implementation, (3) how you will satisfy each acceptance criterion. Output the plan as markdown."

Expected output: A structured plan with phases, file list, and test strategy.

Step 4: Prompt the Agent for Implementation

After reviewing the plan, prompt:

"Implement the plan you created. Create the UserProfile component and any tests. Use the existing auth context for user data. Run tests after implementation and fix any failures."

Expected behavior: The agent creates the component, writes tests, runs them, and iterates until they pass.

Step 5: Validate Against Specification

Prompt:

"Review the implementation against specs/features/001-user-profile.md. Verify each acceptance criterion is satisfied. List any gaps."

Expected output: A checklist showing AC-1, AC-2, AC-3 with pass/fail and any gaps.

What You Learned

  • The agent can work in phases: plan first, then implement
  • Specifications provide structured input for planning
  • The validation step ensures spec compliance
  • AGENTS.md shapes implementation conventions

Agent Limitations

Agents are powerful but not perfect. Understanding limitations helps you configure them effectively and avoid frustration.

Context Window

Agents have a finite context window—typically 100K–200K tokens for modern models. When the context exceeds the limit, earlier content is dropped. This means:

  • Very large codebases may not fit entirely
  • Long conversations may "forget" early decisions
  • Prioritize what the agent sees: specs, relevant files, constraints

Mitigation: Use progressive loading. Put essential context in AGENTS.md and rules. Reference files by path rather than inlining large blocks. Use skills to load task-specific context on demand.

Hallucination

Agents may generate plausible but incorrect content. They might:

  • Invent API methods that don't exist
  • Use wrong parameter names
  • Assume library behavior that differs from reality
  • Create files that conflict with existing code

Mitigation: Validate agent output. Run tests. Use Review Agent for compliance checks. Provide authoritative references (docs, existing code) in context.

Implicit Assumptions

Agents fill gaps with assumptions. "Add a search feature" might assume Elasticsearch when you use Postgres full-text. "Implement auth" might assume JWT when you use sessions.

Mitigation: Write explicit specifications. Use [NEEDS CLARIFICATION] for ambiguities. Include constraints that rule out wrong assumptions. Validate plans before implementation.

Tool Limitations

Agents may not have access to all tools (e.g., production databases, internal APIs). They may misuse tools (wrong command, wrong path).

Mitigation: Document available tools. Restrict dangerous operations. Use sandboxing when appropriate.


When to Use Which Agent for Which Task

TaskRecommended AgentNotes
Write specification from user storySpecification AgentUse spec-writing skill
Create implementation planArchitecture AgentInclude constitution and constraints
Implement featureCoding AgentProvide spec + plan
Generate testsTesting AgentProvide spec + acceptance criteria
Review PR for spec complianceReview AgentProvide spec + code
Fix bugCoding AgentProvide reproduction steps
Refactor architectureArchitecture Agent, then CodingPlan first, then execute
Add observabilityCoding AgentProvide observability spec section
Security auditReview AgentProvide security constraints

Agent Tool Use Patterns

Understanding how agents use tools helps you configure them effectively and debug failures.

File Operations

Agents typically use a small set of file operations:

  • Read: Load file contents into context. Large files may be truncated.
  • Search/Replace: Modify specific regions. Prefer over full-file writes when possible.
  • Write: Create or overwrite files. Use for new files or complete rewrites.

Best practice: Ensure specification and constraint files are readable. Use clear paths. Avoid binary or generated files in context.

Terminal Execution

When the agent runs commands:

  1. Working directory: Usually project root. Verify paths in commands.
  2. Environment: May not have full shell profile. Explicit paths (e.g., npx) often work better than global tools.
  3. Output capture: Agent reads stdout/stderr. Long output may be truncated.
  4. Error handling: Non-zero exit codes signal failure. Agent may retry or report.

Best practice: Prefer idempotent commands. Document required environment (Node version, Python version). Use --no-interactive or equivalent for tools that prompt.

Search and Research

Web search helps agents find:

  • API documentation
  • Error message solutions
  • Best practices
  • Library usage examples

Limitation: Search results can be outdated or from non-authoritative sources. For critical decisions, prefer project documentation or explicit constraints.


Integrating Agents into the SDD Pipeline

The SDD pipeline has distinct phases. Agents integrate at each:

PhaseAgent RoleInputOutput
SpecificationSpecification AgentUser story, requirementsStructured spec
PlanningArchitecture AgentSpec, constitution, constraintsImplementation plan
ImplementationCoding AgentSpec, plan, tasksCode, tests
ValidationTesting AgentSpec, codeTests, coverage
ReviewReview AgentSpec, code, constraintsReview report

Pipeline Invariants

  • Specification before plan: Never generate a plan without a spec.
  • Plan before code: For complex features, plan first.
  • Tests with code: TDD means tests before or alongside implementation.
  • Review before merge: Validate spec compliance and constraints.

Human Checkpoints

Even with autonomous agents, human checkpoints matter:

  1. After spec: Human reviews for completeness and correctness.
  2. After plan: Human approves architecture and approach.
  3. After implementation: Human runs final validation and deploys.

Agents accelerate each phase; humans ensure quality and alignment with business intent.


Agent Configuration Checklist

Before deploying agents in your SDD pipeline, verify:

Context

  • AGENTS.md exists and is concise
  • Specification and constraint files are in discoverable locations
  • .cursorignore excludes large/generated files but includes specs

Capabilities

  • Terminal access enabled if agent needs to run tests
  • Web search enabled if agent needs to research
  • File access includes all relevant directories

Skills and Rules

  • Spec-writing skill (or equivalent) available
  • Code-review skill available for validation
  • Rules for main file types (TypeScript, tests, API)
  • No conflicting guidance between AGENTS.md and rules

Validation

  • Agent can run tests and interpret results
  • Agent has access to constraint documents
  • Human checkpoints defined for spec, plan, and implementation

Case Study: Agent Workflow for a New Feature

Scenario: Add "Export to CSV" to an existing task management API.

Step 1 — Specification Agent (or human with spec skill):

  • Input: "Users need to export their tasks to CSV"
  • Output: specs/features/003-export-csv.md with Problem, FR, AC, Edge Cases, etc.

Step 2 — Architecture Agent:

  • Input: Spec + constitution + constraints
  • Output: Implementation plan with phases (contract, repository, service, controller, tests)

Step 3 — Coding Agent:

  • Input: Spec + plan
  • Output: Code for export endpoint, service, repository method, tests
  • Validation: Run tests, fix failures

Step 4 — Review Agent:

  • Input: Spec + code
  • Output: Review report (AC compliance, constraint check, suggestions)

Total time: With well-configured agents, 30–60 minutes from idea to reviewed implementation. Without agents: several hours. The specification is the constant; agents accelerate each phase.


Try With AI

Prompt 1: Agent Type Selection

"I have this task: [describe task]. Based on the five SDD agent types (Specification, Architecture, Coding, Testing, Review), which agent type(s) should handle this? In what order? Explain your reasoning."

Prompt 2: Implementation Plan Review

"I'm about to implement [feature]. Here's my specification: [paste spec]. Before I write code, generate an implementation plan. Include: (1) files to create/modify, (2) creation order, (3) how each acceptance criterion will be satisfied, (4) any risks or assumptions. Do not write code yet."

Prompt 3: Agent Loop Trace

"After implementing [feature], trace back through the agent loop. What did you read? What was your plan? What did you execute? What did you validate? Did you iterate? If so, what failed and how did you fix it?"

Prompt 4: Limitation Check

"Review this agent-generated code: [paste code]. Identify potential issues due to: (1) hallucination (invented APIs, wrong assumptions), (2) implicit assumptions (what might the agent have assumed that isn't in the spec?), (3) context limits (what might the agent have missed?). Suggest how to prevent each."


Practice Exercises

Exercise 1: Agent Type Mapping

Take three features from your project (or invent them). For each, identify which agent type(s) would handle it and in what sequence. Document the handoffs. Example: "Feature: Add export to CSV. Specification Agent writes spec → Architecture Agent plans (CSV library choice, file structure) → Coding Agent implements → Testing Agent adds tests → Review Agent validates."

Expected outcome: A mapping document showing agent types and sequencing for each feature.

Exercise 2: Single vs. Multi-Agent Decision

For a feature of your choice, decide: single-agent or multi-agent? Document your reasoning. Consider: feature complexity, team size, quality requirements, iteration speed. Then run the feature through your chosen approach and note what worked and what didn't.

Expected outcome: A decision record and retrospective on the chosen approach.

Exercise 3: Limitation Audit

Take an agent-generated implementation from a recent task. Audit it for: (1) hallucination (verify all APIs and libraries exist and are used correctly), (2) implicit assumptions (list what the agent assumed that wasn't in the spec), (3) spec compliance (check each acceptance criterion). Document findings and suggest improvements to spec or agent configuration.

Expected outcome: An audit report with specific findings and recommendations.


Key Takeaways

  1. AI coding agents are autonomous implementers, not autocomplete. They read context, plan, execute, validate, and iterate.

  2. Five agent types serve SDD: Specification (specs), Architecture (plans), Coding (implementation), Testing (tests), Review (compliance).

  3. The agent loop is Read → Plan → Execute → Validate → Iterate. Validation drives iteration until the task is complete.

  4. Single-agent workflows are simpler; multi-agent workflows allow specialization. Choose based on feature complexity and quality needs.

  5. Agent capabilities include file editing, terminal access, web search, and tool use. Configure access appropriately for your environment.

  6. Agent limitations—context window, hallucination, implicit assumptions—require mitigation: explicit specs, validation, and authoritative references.

  7. Match the agent type to the task: Specification Agent for specs, Architecture Agent for plans, Coding Agent for implementation, Testing Agent for tests, Review Agent for compliance.


Chapter Quiz

  1. What is the key difference between autocomplete and an AI coding agent? Why does this matter for SDD?

  2. Name the five agent types in SDD and describe the primary output of each.

  3. Trace the agent loop. What happens in each phase? When does the agent iterate vs. finish?

  4. When would you choose a multi-agent workflow over a single-agent workflow? What are the trade-offs?

  5. What agent capabilities (beyond text generation) are typically available? Give one example of how each supports SDD.

  6. How do context window limits affect agent behavior? What strategies can mitigate this?

  7. What is "hallucination" in the context of AI agents? Give two examples and how to prevent them.

  8. For the task "Add rate limiting to the API per the security constraints," which agent type(s) would you use and in what order? Explain.