Chapter 26: Spec-Driven Testing
Learning Objectives
By the end of this chapter, you will be able to:
- Explain the fundamental principle: tests derive from specifications, not from implementation
- Identify and apply four test types in SDD: unit, integration, contract, and end-to-end
- Map specification elements to test types: acceptance criteria → integration tests, edge cases → unit tests, API contracts → contract tests, user journeys → e2e tests
- Apply test-first in SDD (Article III of the constitution): write tests before implementation
- Understand the testing pyramid in spec-driven systems
- Generate a complete test suite from a feature specification through a hands-on tutorial
- Establish traceability: every test links to a specification requirement
- Measure test coverage as specification coverage, not code coverage
- Use tools such as Vitest, Playwright, Jest, Pytest, and Cucumber effectively
The Fundamental Principle: Tests Derive from Specifications
In traditional development, tests often follow implementation. You write code, then you write tests to verify the code. The tests become coupled to the implementation—refactor the code, and the tests break. The tests answer: "Does this code do what this code does?"
In Spec-Driven Development, tests derive from specifications. You write a spec, then you write tests that verify the spec, then you implement until the tests pass. The tests answer: "Does this implementation satisfy the specification?" Implementation can change; the spec and tests remain the anchor.
Why This Matters
| Traditional Testing | Spec-Driven Testing |
|---|---|
| Tests verify code behavior | Tests verify specification compliance |
| Refactoring breaks tests | Refactoring preserves tests (if spec unchanged) |
| Coverage = lines of code exercised | Coverage = requirements exercised |
| Tests coupled to implementation | Tests coupled to specification |
| "Does the code work?" | "Did we build what we specified?" |
When AI generates code, you cannot manually inspect every line. You need automated verification that the output satisfies the spec. Spec-driven testing provides that verification. The spec is the oracle; the tests are the automated oracle-checker.
Four Test Types in SDD
Spec-Driven Development organizes tests into four types, each mapped to a different layer of the specification.
1. Unit Tests
Source: Edge cases, data model constraints, pure logic
Purpose: Verify that individual functions, modules, or components behave correctly for specific inputs. Unit tests are fast, isolated, and numerous.
Spec mapping: Constraints ("email must be valid format"), edge cases ("empty string returns error"), data model invariants ("IDs are non-negative").
Example spec → test:
- Spec: "FR-003: User IDs are always UUID v4 format"
- Test:
validateUserId("invalid")throws;validateUserId(validUuid)returns true
2. Integration Tests
Source: Acceptance criteria, API behavior, component interactions
Purpose: Verify that multiple components work together correctly. Integration tests hit real databases, APIs, or services (or close facsimiles).
Spec mapping: Acceptance criteria ("Given X, when Y, then Z"), API flows ("POST creates resource, GET returns it"), component interactions ("auth service validates token before data access").
Example spec → test:
- Spec: "AC-001: Given a user with valid credentials, when they log in, then they receive a session token"
- Test: Call login API with valid credentials → assert 200, assert token in response
3. Contract Tests
Source: API contracts (OpenAPI, AsyncAPI), interface definitions
Purpose: Verify that API responses match the contract schema. Contract tests validate structure and types, not business logic.
Spec mapping: OpenAPI schemas, request/response definitions, event payloads.
Example spec → test:
- Spec:
GET /users/{id}returns{ id: string, email: string, createdAt: string } - Test: Call endpoint → assert response matches JSON schema
4. End-to-End (E2E) Tests
Source: User journeys, critical flows, cross-system behavior
Purpose: Verify complete user flows through the system as a real user would experience them. E2E tests are slow, brittle if overused, but essential for critical paths.
Spec mapping: User stories ("As a user, I want to..."), user journeys ("User signs up → verifies email → logs in → completes onboarding").
Example spec → test:
- Spec: "User journey: Sign up → Verify email → Log in → View dashboard"
- Test: Playwright/Cypress simulates full flow in browser
5. Agentic Fuzzing
Source: Edge cases, Constraints, Security Requirements
Purpose: Actively try to break the code by generating malicious or edge-case inputs. Instead of writing static test cases, you instruct a Testing Agent to attack the implementation.
Spec mapping: Boundary conditions, constraint violations ("Must not allow token reuse"), error handling.
Example spec → test:
- Spec: "FR-004: Rate Limiting: 3 requests per email per 15 min."
- Test: A testing agent runs a loop firing random concurrent requests trying to bypass the rate limit, using variations of email casing and whitespace.
How Specifications Generate Test Cases
The mapping from spec to test is systematic. Each specification element has a corresponding test type and derivation rule.
Acceptance Criteria → Integration Tests
Acceptance criteria are written in Given/When/Then format. Each criterion becomes one or more integration tests.
Spec:
## Acceptance Criteria
- AC-001: Given a user exists, when they request their profile via GET /users/me, then the response contains id, email, and displayName
- AC-002: Given an unauthenticated request, when they call GET /users/me, then the response is 401 Unauthorized
Generated integration tests (pseudocode):
describe('GET /users/me', () => {
it('AC-001: returns profile for authenticated user', async () => {
const user = await createTestUser();
const res = await api.get('/users/me', { headers: { Authorization: `Bearer ${user.token}` } });
expect(res.status).toBe(200);
expect(res.body).toHaveProperty('id');
expect(res.body).toHaveProperty('email');
expect(res.body).toHaveProperty('displayName');
});
it('AC-002: returns 401 for unauthenticated request', async () => {
const res = await api.get('/users/me');
expect(res.status).toBe(401);
});
});
Edge Cases → Unit Tests
Edge cases describe boundary conditions and error paths. They become unit tests for the logic that handles those cases.
Spec:
## Edge Cases
- Empty message: Reject with validation error
- Message exceeds 10,000 characters: Reject with validation error
- Message with only whitespace: Treat as empty, reject
Generated unit tests:
describe('validateMessage', () => {
it('rejects empty message', () => {
expect(() => validateMessage('')).toThrow('Message cannot be empty');
});
it('rejects message exceeding 10,000 characters', () => {
const long = 'a'.repeat(10001);
expect(() => validateMessage(long)).toThrow('Message exceeds maximum length');
});
it('rejects message with only whitespace', () => {
expect(() => validateMessage(' \n\t ')).toThrow('Message cannot be empty');
});
});
API Contracts → Contract Tests
API contracts (OpenAPI, etc.) define request/response schemas. Contract tests validate that actual responses conform.
Spec (OpenAPI snippet):
paths:
/bookmarks:
get:
responses:
'200':
content:
application/json:
schema:
type: object
properties:
items:
type: array
items:
$ref: '#/components/schemas/Bookmark'
total:
type: integer
required: [items, total]
Contract test (using Dredd or similar):
- Request:
GET /bookmarks - Assert: Response body matches schema (items array, total integer)
User Journeys → E2E Tests
User journeys describe complete flows. Each journey becomes an E2E test scenario.
Spec:
## User Journey: Create Bookmark
1. User logs in
2. User navigates to "Add Bookmark"
3. User enters URL and optional title
4. User submits
5. User sees bookmark in list with "Created" confirmation
E2E test (Playwright):
test('User journey: Create Bookmark', async ({ page }) => {
await page.goto('/login');
await page.fill('[name=email]', 'test@example.com');
await page.fill('[name=password]', 'password123');
await page.click('button[type=submit]');
await page.waitForURL('**/dashboard');
await page.click('text=Add Bookmark');
await page.fill('[name=url]', 'https://example.com/article');
await page.fill('[name=title]', 'Example Article');
await page.click('button[type=submit]');
await expect(page.locator('text=Created')).toBeVisible();
await expect(page.locator('text=Example Article')).toBeVisible();
});
Test-First in SDD: Article III of the Constitution
The SDD constitution (introduced in Chapter 14) includes Article III: Tests are written from specifications before implementation. This is not optional—it is the mechanism that closes the loop.
The Red-Green-Refactor Loop in SDD
- Red: Write a test from the spec. Run it. It fails (no implementation yet).
- Green: Implement until the test passes. Minimal implementation; no gold-plating.
- Refactor: Improve implementation quality. Tests stay green.
The key difference from traditional TDD: the test is derived from the spec, not from your intuition about what the code should do. The spec is the source of truth. If the spec says "email must be valid," you write a test for that before writing validation logic.
Why Test-First Matters for AI-Generated Code
When AI generates implementation:
- Without test-first: You get code. You hope it works. You manually test. You find bugs. You fix. Repeat. No automated regression safety.
- With test-first: You have tests from the spec. AI generates code. You run tests. Failures tell you exactly what's wrong. You fix (or regenerate). Tests pass. You have regression safety.
Test-first turns AI generation into a verifiable pipeline. The tests are the acceptance criteria for the AI's output.
The Testing Pyramid in Spec-Driven Systems
The classic testing pyramid applies, but with a spec-driven twist:
/\
/ \ E2E (few, critical journeys)
/----\
/ \ Integration (acceptance criteria, API flows)
/--------\
/ \ Unit (edge cases, constraints)
/------------\
/ Contract \ Contract (API schema validation)
/________________\
Base: Unit tests and contract tests. Many, fast, derived from constraints and schemas.
Middle: Integration tests. Moderate count, moderate speed, derived from acceptance criteria.
Top: E2E tests. Few, slow, derived from user journeys.
Spec coverage: The pyramid is inverted when measuring coverage. You want 100% of specification requirements covered by tests. That might mean 50 unit tests, 20 integration tests, 5 E2E tests—depending on spec size. The metric is "requirements covered," not "lines covered."
Tutorial: Generate a Complete Test Suite from a Feature Specification
This tutorial walks you through generating a test suite for a "Bookmarks" feature. You will parse the specification, derive tests, run them (expecting failures), implement until green, and establish traceability.
Step 0: The Feature Specification
Assume you have the following specification at specs/005-bookmarks/spec.md:
# Feature 005: Bookmarks
## Problem
Users need to save and organize URLs for later reference. Without bookmarks, users must re-search or remember links.
## Functional Requirements
- FR-001: Users can create a bookmark with URL (required) and optional title
- FR-002: Users can list their bookmarks, paginated (default 20 per page)
- FR-003: Users can delete a bookmark by ID
- FR-004: Bookmark URLs must be valid (http/https)
- FR-005: Bookmark titles are max 200 characters
## Acceptance Criteria
- AC-001: Given an authenticated user, when they POST /bookmarks with valid URL, then a bookmark is created and returns 201 with bookmark object
- AC-002: Given an authenticated user, when they GET /bookmarks, then they receive their bookmarks only (user-scoped)
- AC-003: Given an authenticated user, when they DELETE /bookmarks/:id for their bookmark, then the bookmark is deleted and returns 204
- AC-004: Given invalid URL format, when they POST /bookmarks, then returns 400 with validation error
## Edge Cases
- Empty URL: 400
- URL without protocol: 400
- Title exceeding 200 chars: 400
- Delete non-existent bookmark: 404
- Delete another user's bookmark: 403
## API Contract (summary)
POST /bookmarks: { url: string, title?: string } → 201 { id, url, title, createdAt }
GET /bookmarks: ?page=1&limit=20 → 200 { items: Bookmark[], total: number }
DELETE /bookmarks/:id → 204
Step 1: Parse Specification for Testable Criteria
Extract the testable elements:
| Spec Element | Type | Test Type |
|---|---|---|
| FR-004, FR-005, Edge cases (validation) | Constraint | Unit |
| AC-001 to AC-004 | Acceptance | Integration |
| API contract | Schema | Contract |
| User journey: Create → List → Delete | Journey | E2E |
Step 2: Write Contract Tests from API Spec
Create tests/contract/bookmarks.openapi.test.ts (using a contract testing tool or manual schema validation):
import { describe, it, expect } from 'vitest';
import Ajv from 'ajv';
import bookmarkSchema from '../schemas/bookmark.json';
const ajv = new Ajv();
describe('Bookmarks API Contract', () => {
const validateBookmark = ajv.compile(bookmarkSchema);
it('POST /bookmarks response matches schema', async () => {
const res = await fetch('/api/bookmarks', {
method: 'POST',
headers: { 'Content-Type': 'application/json', Authorization: `Bearer ${token}` },
body: JSON.stringify({ url: 'https://example.com' }),
});
const data = await res.json();
expect(validateBookmark(data)).toBe(true);
expect(data).toHaveProperty('id');
expect(data).toHaveProperty('url');
expect(data).toHaveProperty('createdAt');
});
it('GET /bookmarks response matches schema', async () => {
const res = await fetch('/api/bookmarks');
const data = await res.json();
expect(data).toHaveProperty('items');
expect(data).toHaveProperty('total');
expect(Array.isArray(data.items)).toBe(true);
});
});
Step 3: Write Integration Tests from Acceptance Criteria
Create tests/integration/bookmarks.test.ts:
import { describe, it, expect, beforeAll } from 'vitest';
import { createTestUser, api } from '../helpers';
describe('Bookmarks API', () => {
let authToken: string;
beforeAll(async () => {
const user = await createTestUser();
authToken = user.token;
});
it('AC-001: POST /bookmarks creates bookmark and returns 201', async () => {
const res = await api.post('/bookmarks', { url: 'https://example.com' }, { headers: { Authorization: `Bearer ${authToken}` } });
expect(res.status).toBe(201);
expect(res.data).toMatchObject({ url: 'https://example.com' });
});
it('AC-002: GET /bookmarks returns user-scoped bookmarks', async () => {
const res = await api.get('/bookmarks', { headers: { Authorization: `Bearer ${authToken}` } });
expect(res.status).toBe(200);
expect(res.data.items).toBeDefined();
expect(res.data.total).toBeGreaterThanOrEqual(0);
});
it('AC-003: DELETE /bookmarks/:id returns 204', async () => {
const createRes = await api.post('/bookmarks', { url: 'https://delete-me.com' }, { headers: { Authorization: `Bearer ${authToken}` } });
const id = createRes.data.id;
const deleteRes = await api.delete(`/bookmarks/${id}`, { headers: { Authorization: `Bearer ${authToken}` } });
expect(deleteRes.status).toBe(204);
});
it('AC-004: POST with invalid URL returns 400', async () => {
const res = await api.post('/bookmarks', { url: 'not-a-url' }, { headers: { Authorization: `Bearer ${authToken}` } });
expect(res.status).toBe(400);
expect(res.data.error).toBeDefined();
});
});
Step 4: Write Unit Tests from Edge Cases
Create tests/unit/bookmark-validation.test.ts:
import { describe, it, expect } from 'vitest';
import { validateBookmarkInput } from '../../src/bookmarks/validation';
describe('validateBookmarkInput', () => {
it('rejects empty URL', () => {
expect(() => validateBookmarkInput({ url: '' })).toThrow();
});
it('rejects URL without protocol', () => {
expect(() => validateBookmarkInput({ url: 'example.com' })).toThrow();
});
it('rejects title exceeding 200 chars', () => {
expect(() => validateBookmarkInput({ url: 'https://example.com', title: 'a'.repeat(201) })).toThrow();
});
it('accepts valid URL and optional title', () => {
expect(validateBookmarkInput({ url: 'https://example.com' })).toBeDefined();
expect(validateBookmarkInput({ url: 'https://example.com', title: 'Example' })).toBeDefined();
});
});
Step 5: Write E2E Tests from User Journeys
Create tests/e2e/bookmarks.spec.ts (Playwright):
import { test, expect } from '@playwright/test';
test.describe('Bookmarks User Journey', () => {
test('Create, list, and delete bookmark', async ({ page }) => {
await page.goto('/login');
await page.fill('[name=email]', 'e2e@example.com');
await page.fill('[name=password]', 'password123');
await page.click('button[type=submit]');
await page.waitForURL('**/dashboard');
await page.click('text=Add Bookmark');
await page.fill('[name=url]', 'https://example.com/article');
await page.fill('[name=title]', 'Test Bookmark');
await page.click('button[type=submit]');
await expect(page.locator('text=Test Bookmark')).toBeVisible();
await page.click('text=Test Bookmark');
await page.click('button:has-text("Delete")');
await expect(page.locator('text=Test Bookmark')).not.toBeVisible();
});
});
Step 6: Run Tests (Red Phase)
Run the test suite:
npm run test:unit
npm run test:integration
npm run test:contract
npm run test:e2e
All tests fail—implementation does not exist yet. This is expected. The Red phase confirms that your tests correctly detect missing or incorrect behavior.
Step 7: Implement Until Green
Implement the bookmarks feature:
- Create validation module (
validateBookmarkInput) - Create API routes (POST, GET, DELETE)
- Create database layer
- Wire up authentication
Run tests after each step. When all pass, you have achieved Green.
Step 8: Establish Traceability
Add traceability comments to each test:
it('AC-001: POST /bookmarks creates bookmark and returns 201', async () => {
// Traceability: specs/005-bookmarks/spec.md AC-001
...
});
Maintain a traceability matrix (in specs/005-bookmarks/traceability.md or similar):
| Requirement | Test File | Test Name |
|---|---|---|
| AC-001 | bookmarks.test.ts | AC-001: POST /bookmarks creates bookmark |
| AC-002 | bookmarks.test.ts | AC-002: GET /bookmarks returns user-scoped |
| FR-004 | bookmark-validation.test.ts | rejects URL without protocol |
| ... | ... | ... |
When the spec changes, the matrix tells you exactly which tests to update.
Traceability: Every Test Links to a Specification Requirement
Traceability means: for every requirement in the spec, there is at least one test; for every test, there is a requirement it verifies. No orphan tests. No untested requirements.
Traceability Format
In test code:
// @spec specs/005-bookmarks/spec.md
// @requirement AC-001
it('returns 201 when creating bookmark', ...);
In spec:
- AC-001: Given authenticated user, when POST /bookmarks with valid URL, then 201 with bookmark
- Test: tests/integration/bookmarks.test.ts::AC-001
Benefits
- Change impact analysis: Update AC-002 → run tests for AC-002 → know if you broke something
- Coverage reporting: "87% of requirements have passing tests" (not "87% line coverage")
- Audit trail: Compliance and certification often require requirement-to-test mapping
Test Coverage as Specification Coverage
Traditional coverage measures code: lines, branches, functions. Spec-driven coverage measures requirements: how many acceptance criteria, edge cases, and user journeys have passing tests.
Specification Coverage Formula
Specification Coverage = (Requirements with ≥1 passing test) / (Total requirements) × 100%
Example
- Total requirements: 15 (5 FR, 4 AC, 5 edge cases, 1 journey)
- Requirements with passing tests: 14
- Specification coverage: 93.3%
A requirement is "covered" if at least one test that traces to it passes. You can have 100% spec coverage with 50% code coverage—if the critical paths are tested. Or 80% code coverage with 60% spec coverage—if you're testing the wrong things.
Goal: 100% specification coverage for the features you ship.
Tools: Vitest, Playwright, Jest, Pytest, Cucumber
Vitest (JavaScript/TypeScript)
- Fast, Vite-native test runner
- Compatible with Jest API
- Good for unit and integration tests
- Use with
@vitest/coverage-v8for coverage
Playwright (E2E)
- Cross-browser E2E testing
- Auto-wait, trace viewer, parallel execution
- Use for user journey tests
- Integrates with BDD (Gherkin) via plugins
Jest (JavaScript/TypeScript)
- Mature, widely used
- Snapshot testing, mocking
- Alternative to Vitest for teams already on Jest
Pytest (Python)
- De facto standard for Python testing
- Fixtures, parametrize, markers
- Use for unit and integration tests in Python projects
Cucumber (BDD)
- Gherkin syntax: Feature, Scenario, Given/When/Then
- Executable specifications—feature files are both docs and tests
- Use when specs are written in Gherkin
- Integrates with many languages (Ruby, Java, JS, Python)
Common Pitfalls in Spec-Driven Testing
Pitfall 1: Testing Implementation Instead of Behavior
Wrong: Testing that BookmarkRepository.save() is called with specific arguments.
Right: Testing that after POST /bookmarks, GET /bookmarks returns the created bookmark.
Behavior survives refactoring. Implementation details do not.
Pitfall 2: Orphan Tests
Wrong: Writing tests because "we should test this" without linking to a requirement.
Right: Every test traces to a spec element. If you cannot trace it, either add the requirement to the spec or remove the test.
Pitfall 3: Vague Acceptance Criteria
Wrong: "User can create a bookmark" (not testable—how do you verify?)
Right: "Given authenticated user, when POST /bookmarks with valid URL, then 201 with { id, url, createdAt }"
Vague criteria produce vague tests. Strengthen the spec first.
Pitfall 4: Over-Reliance on E2E
Wrong: 50 E2E tests covering every scenario. Slow, brittle, hard to maintain.
Right: E2E for critical user journeys only. Use integration tests for API flows, unit tests for logic.
Pitfall 5: Ignoring Contract Tests
Wrong: "Our integration tests hit the API, so we're good."
Right: Integration tests verify behavior; contract tests verify structure. A provider can return wrong types (e.g., string instead of number) and still "work" in a happy-path integration test. Contract tests catch schema drift.
Try With AI
Prompt 1: Test Derivation
"I have a feature specification at specs/005-bookmarks/spec.md. Parse it and generate a test plan: for each acceptance criterion, edge case, and user journey, specify (1) the test type (unit/integration/contract/e2e), (2) the test name, (3) the key assertion. Output as a markdown table."
Prompt 2: Test Implementation
"Using the test plan from [previous output], implement the integration tests for the acceptance criteria. Use Vitest and a test helper that creates an authenticated user. Include traceability comments linking each test to the spec requirement (AC-001, AC-002, etc.)."
Prompt 3: Traceability Matrix
"I have tests in tests/integration/bookmarks.test.ts and tests/unit/bookmark-validation.test.ts. Extract all @requirement or @spec comments (or similar traceability). Generate a traceability matrix: Requirement ID | Test File | Test Name. Identify any requirements in specs/005-bookmarks/spec.md that have no linked tests."
Prompt 4: Red-Green Workflow
"I'm in the Red phase: I have failing tests for the bookmarks feature. The spec is at specs/005-bookmarks/spec.md. Generate the minimal implementation to make the integration tests pass. Start with validation, then the API routes. Do not add features not in the spec."
Practice Exercises
Exercise 1: Derive Tests from a Spec
Take the "Export data as CSV" specification from Chapter 19 (or create a minimal version). Derive a test plan: list each testable requirement, its test type, and the test name. Do not write code—only the plan. Ensure every requirement has at least one test.
Expected outcome: A test plan table with 10+ test cases covering all requirements.
Exercise 2: Implement Spec-Driven Tests
Choose one acceptance criterion from a spec you have. Write the integration test first (Red). Then implement the minimal code to pass (Green). Document the traceability. Reflect: did test-first change how you implemented?
Expected outcome: A passing test with traceability, and a brief reflection (1 paragraph).
Exercise 3: Specification Coverage Report
For a small project (or a subset of a larger one), create a specification coverage report. List all requirements. For each, identify the test(s) that cover it. Calculate coverage %. Identify gaps. Propose tests to close the gaps.
Expected outcome: A coverage report with percentage and a list of proposed tests for uncovered requirements.
Key Takeaways
-
Tests derive from specifications, not implementation. The spec is the oracle. Tests verify that implementation satisfies the spec. This decouples tests from implementation and makes refactoring safer.
-
Four test types map to spec elements: unit (edge cases, constraints), integration (acceptance criteria), contract (API schemas), e2e (user journeys). Use the right type for each requirement.
-
Test-first in SDD (Article III): Write tests from the spec before implementation. Red → Green → Refactor. For AI-generated code, tests are the acceptance criteria.
-
Traceability links every test to a requirement and every requirement to tests. Use comments, matrices, or tooling. Traceability enables change impact analysis and specification coverage reporting.
-
Specification coverage measures requirements with passing tests, not lines of code. Aim for 100% spec coverage for shipped features.
-
The testing pyramid applies: many unit/contract tests, moderate integration tests, few e2e tests. All derived from the specification.
Chapter Quiz
-
What is the fundamental principle of spec-driven testing? How does it differ from testing that follows implementation?
-
For each of these spec elements, which test type would you use: (a) "Email must be valid format," (b) "Given user logs in, when they request profile, then 200 with user data," (c) "GET /users returns array of User objects," (d) "User completes signup → verification → login"?
-
What is Article III of the SDD constitution, and why does it matter for AI-generated code?
-
In the Red-Green-Refactor loop, what is the source of truth for the test? (The implementation, the spec, or the developer's intuition?)
-
What is specification coverage, and how does it differ from code coverage? Which should you prioritize in SDD?
-
What is traceability in the context of spec-driven testing? Name two benefits.
-
You have a spec with 10 acceptance criteria, 5 edge cases, and 1 user journey. How would you distribute tests across unit, integration, contract, and e2e? (Approximate counts.)
-
Name three tools for spec-driven testing (from different layers: unit, integration, e2e) and when you would use each.