Skip to main content

Chapter 26: Spec-Driven Testing


Learning Objectives

By the end of this chapter, you will be able to:

  • Explain the fundamental principle: tests derive from specifications, not from implementation
  • Identify and apply four test types in SDD: unit, integration, contract, and end-to-end
  • Map specification elements to test types: acceptance criteria → integration tests, edge cases → unit tests, API contracts → contract tests, user journeys → e2e tests
  • Apply test-first in SDD (Article III of the constitution): write tests before implementation
  • Understand the testing pyramid in spec-driven systems
  • Generate a complete test suite from a feature specification through a hands-on tutorial
  • Establish traceability: every test links to a specification requirement
  • Measure test coverage as specification coverage, not code coverage
  • Use tools such as Vitest, Playwright, Jest, Pytest, and Cucumber effectively

The Fundamental Principle: Tests Derive from Specifications

In traditional development, tests often follow implementation. You write code, then you write tests to verify the code. The tests become coupled to the implementation—refactor the code, and the tests break. The tests answer: "Does this code do what this code does?"

In Spec-Driven Development, tests derive from specifications. You write a spec, then you write tests that verify the spec, then you implement until the tests pass. The tests answer: "Does this implementation satisfy the specification?" Implementation can change; the spec and tests remain the anchor.

Why This Matters

Traditional TestingSpec-Driven Testing
Tests verify code behaviorTests verify specification compliance
Refactoring breaks testsRefactoring preserves tests (if spec unchanged)
Coverage = lines of code exercisedCoverage = requirements exercised
Tests coupled to implementationTests coupled to specification
"Does the code work?""Did we build what we specified?"

When AI generates code, you cannot manually inspect every line. You need automated verification that the output satisfies the spec. Spec-driven testing provides that verification. The spec is the oracle; the tests are the automated oracle-checker.


Four Test Types in SDD

Spec-Driven Development organizes tests into four types, each mapped to a different layer of the specification.

1. Unit Tests

Source: Edge cases, data model constraints, pure logic

Purpose: Verify that individual functions, modules, or components behave correctly for specific inputs. Unit tests are fast, isolated, and numerous.

Spec mapping: Constraints ("email must be valid format"), edge cases ("empty string returns error"), data model invariants ("IDs are non-negative").

Example spec → test:

  • Spec: "FR-003: User IDs are always UUID v4 format"
  • Test: validateUserId("invalid") throws; validateUserId(validUuid) returns true

2. Integration Tests

Source: Acceptance criteria, API behavior, component interactions

Purpose: Verify that multiple components work together correctly. Integration tests hit real databases, APIs, or services (or close facsimiles).

Spec mapping: Acceptance criteria ("Given X, when Y, then Z"), API flows ("POST creates resource, GET returns it"), component interactions ("auth service validates token before data access").

Example spec → test:

  • Spec: "AC-001: Given a user with valid credentials, when they log in, then they receive a session token"
  • Test: Call login API with valid credentials → assert 200, assert token in response

3. Contract Tests

Source: API contracts (OpenAPI, AsyncAPI), interface definitions

Purpose: Verify that API responses match the contract schema. Contract tests validate structure and types, not business logic.

Spec mapping: OpenAPI schemas, request/response definitions, event payloads.

Example spec → test:

  • Spec: GET /users/{id} returns { id: string, email: string, createdAt: string }
  • Test: Call endpoint → assert response matches JSON schema

4. End-to-End (E2E) Tests

Source: User journeys, critical flows, cross-system behavior

Purpose: Verify complete user flows through the system as a real user would experience them. E2E tests are slow, brittle if overused, but essential for critical paths.

Spec mapping: User stories ("As a user, I want to..."), user journeys ("User signs up → verifies email → logs in → completes onboarding").

Example spec → test:

  • Spec: "User journey: Sign up → Verify email → Log in → View dashboard"
  • Test: Playwright/Cypress simulates full flow in browser

5. Agentic Fuzzing

Source: Edge cases, Constraints, Security Requirements

Purpose: Actively try to break the code by generating malicious or edge-case inputs. Instead of writing static test cases, you instruct a Testing Agent to attack the implementation.

Spec mapping: Boundary conditions, constraint violations ("Must not allow token reuse"), error handling.

Example spec → test:

  • Spec: "FR-004: Rate Limiting: 3 requests per email per 15 min."
  • Test: A testing agent runs a loop firing random concurrent requests trying to bypass the rate limit, using variations of email casing and whitespace.

How Specifications Generate Test Cases

The mapping from spec to test is systematic. Each specification element has a corresponding test type and derivation rule.

Acceptance Criteria → Integration Tests

Acceptance criteria are written in Given/When/Then format. Each criterion becomes one or more integration tests.

Spec:

## Acceptance Criteria

- AC-001: Given a user exists, when they request their profile via GET /users/me, then the response contains id, email, and displayName
- AC-002: Given an unauthenticated request, when they call GET /users/me, then the response is 401 Unauthorized

Generated integration tests (pseudocode):

describe('GET /users/me', () => {
it('AC-001: returns profile for authenticated user', async () => {
const user = await createTestUser();
const res = await api.get('/users/me', { headers: { Authorization: `Bearer ${user.token}` } });
expect(res.status).toBe(200);
expect(res.body).toHaveProperty('id');
expect(res.body).toHaveProperty('email');
expect(res.body).toHaveProperty('displayName');
});

it('AC-002: returns 401 for unauthenticated request', async () => {
const res = await api.get('/users/me');
expect(res.status).toBe(401);
});
});

Edge Cases → Unit Tests

Edge cases describe boundary conditions and error paths. They become unit tests for the logic that handles those cases.

Spec:

## Edge Cases

- Empty message: Reject with validation error
- Message exceeds 10,000 characters: Reject with validation error
- Message with only whitespace: Treat as empty, reject

Generated unit tests:

describe('validateMessage', () => {
it('rejects empty message', () => {
expect(() => validateMessage('')).toThrow('Message cannot be empty');
});

it('rejects message exceeding 10,000 characters', () => {
const long = 'a'.repeat(10001);
expect(() => validateMessage(long)).toThrow('Message exceeds maximum length');
});

it('rejects message with only whitespace', () => {
expect(() => validateMessage(' \n\t ')).toThrow('Message cannot be empty');
});
});

API Contracts → Contract Tests

API contracts (OpenAPI, etc.) define request/response schemas. Contract tests validate that actual responses conform.

Spec (OpenAPI snippet):

paths:
/bookmarks:
get:
responses:
'200':
content:
application/json:
schema:
type: object
properties:
items:
type: array
items:
$ref: '#/components/schemas/Bookmark'
total:
type: integer
required: [items, total]

Contract test (using Dredd or similar):

  • Request: GET /bookmarks
  • Assert: Response body matches schema (items array, total integer)

User Journeys → E2E Tests

User journeys describe complete flows. Each journey becomes an E2E test scenario.

Spec:

## User Journey: Create Bookmark

1. User logs in
2. User navigates to "Add Bookmark"
3. User enters URL and optional title
4. User submits
5. User sees bookmark in list with "Created" confirmation

E2E test (Playwright):

test('User journey: Create Bookmark', async ({ page }) => {
await page.goto('/login');
await page.fill('[name=email]', 'test@example.com');
await page.fill('[name=password]', 'password123');
await page.click('button[type=submit]');
await page.waitForURL('**/dashboard');

await page.click('text=Add Bookmark');
await page.fill('[name=url]', 'https://example.com/article');
await page.fill('[name=title]', 'Example Article');
await page.click('button[type=submit]');

await expect(page.locator('text=Created')).toBeVisible();
await expect(page.locator('text=Example Article')).toBeVisible();
});

Test-First in SDD: Article III of the Constitution

The SDD constitution (introduced in Chapter 14) includes Article III: Tests are written from specifications before implementation. This is not optional—it is the mechanism that closes the loop.

The Red-Green-Refactor Loop in SDD

  1. Red: Write a test from the spec. Run it. It fails (no implementation yet).
  2. Green: Implement until the test passes. Minimal implementation; no gold-plating.
  3. Refactor: Improve implementation quality. Tests stay green.

The key difference from traditional TDD: the test is derived from the spec, not from your intuition about what the code should do. The spec is the source of truth. If the spec says "email must be valid," you write a test for that before writing validation logic.

Why Test-First Matters for AI-Generated Code

When AI generates implementation:

  • Without test-first: You get code. You hope it works. You manually test. You find bugs. You fix. Repeat. No automated regression safety.
  • With test-first: You have tests from the spec. AI generates code. You run tests. Failures tell you exactly what's wrong. You fix (or regenerate). Tests pass. You have regression safety.

Test-first turns AI generation into a verifiable pipeline. The tests are the acceptance criteria for the AI's output.


The Testing Pyramid in Spec-Driven Systems

The classic testing pyramid applies, but with a spec-driven twist:

        /\
/ \ E2E (few, critical journeys)
/----\
/ \ Integration (acceptance criteria, API flows)
/--------\
/ \ Unit (edge cases, constraints)
/------------\
/ Contract \ Contract (API schema validation)
/________________\

Base: Unit tests and contract tests. Many, fast, derived from constraints and schemas.

Middle: Integration tests. Moderate count, moderate speed, derived from acceptance criteria.

Top: E2E tests. Few, slow, derived from user journeys.

Spec coverage: The pyramid is inverted when measuring coverage. You want 100% of specification requirements covered by tests. That might mean 50 unit tests, 20 integration tests, 5 E2E tests—depending on spec size. The metric is "requirements covered," not "lines covered."


Tutorial: Generate a Complete Test Suite from a Feature Specification

This tutorial walks you through generating a test suite for a "Bookmarks" feature. You will parse the specification, derive tests, run them (expecting failures), implement until green, and establish traceability.

Step 0: The Feature Specification

Assume you have the following specification at specs/005-bookmarks/spec.md:

# Feature 005: Bookmarks

## Problem

Users need to save and organize URLs for later reference. Without bookmarks, users must re-search or remember links.

## Functional Requirements

- FR-001: Users can create a bookmark with URL (required) and optional title
- FR-002: Users can list their bookmarks, paginated (default 20 per page)
- FR-003: Users can delete a bookmark by ID
- FR-004: Bookmark URLs must be valid (http/https)
- FR-005: Bookmark titles are max 200 characters

## Acceptance Criteria

- AC-001: Given an authenticated user, when they POST /bookmarks with valid URL, then a bookmark is created and returns 201 with bookmark object
- AC-002: Given an authenticated user, when they GET /bookmarks, then they receive their bookmarks only (user-scoped)
- AC-003: Given an authenticated user, when they DELETE /bookmarks/:id for their bookmark, then the bookmark is deleted and returns 204
- AC-004: Given invalid URL format, when they POST /bookmarks, then returns 400 with validation error

## Edge Cases

- Empty URL: 400
- URL without protocol: 400
- Title exceeding 200 chars: 400
- Delete non-existent bookmark: 404
- Delete another user's bookmark: 403

## API Contract (summary)

POST /bookmarks: { url: string, title?: string } → 201 { id, url, title, createdAt }
GET /bookmarks: ?page=1&limit=20 → 200 { items: Bookmark[], total: number }
DELETE /bookmarks/:id → 204

Step 1: Parse Specification for Testable Criteria

Extract the testable elements:

Spec ElementTypeTest Type
FR-004, FR-005, Edge cases (validation)ConstraintUnit
AC-001 to AC-004AcceptanceIntegration
API contractSchemaContract
User journey: Create → List → DeleteJourneyE2E

Step 2: Write Contract Tests from API Spec

Create tests/contract/bookmarks.openapi.test.ts (using a contract testing tool or manual schema validation):

import { describe, it, expect } from 'vitest';
import Ajv from 'ajv';
import bookmarkSchema from '../schemas/bookmark.json';

const ajv = new Ajv();

describe('Bookmarks API Contract', () => {
const validateBookmark = ajv.compile(bookmarkSchema);

it('POST /bookmarks response matches schema', async () => {
const res = await fetch('/api/bookmarks', {
method: 'POST',
headers: { 'Content-Type': 'application/json', Authorization: `Bearer ${token}` },
body: JSON.stringify({ url: 'https://example.com' }),
});
const data = await res.json();
expect(validateBookmark(data)).toBe(true);
expect(data).toHaveProperty('id');
expect(data).toHaveProperty('url');
expect(data).toHaveProperty('createdAt');
});

it('GET /bookmarks response matches schema', async () => {
const res = await fetch('/api/bookmarks');
const data = await res.json();
expect(data).toHaveProperty('items');
expect(data).toHaveProperty('total');
expect(Array.isArray(data.items)).toBe(true);
});
});

Step 3: Write Integration Tests from Acceptance Criteria

Create tests/integration/bookmarks.test.ts:

import { describe, it, expect, beforeAll } from 'vitest';
import { createTestUser, api } from '../helpers';

describe('Bookmarks API', () => {
let authToken: string;

beforeAll(async () => {
const user = await createTestUser();
authToken = user.token;
});

it('AC-001: POST /bookmarks creates bookmark and returns 201', async () => {
const res = await api.post('/bookmarks', { url: 'https://example.com' }, { headers: { Authorization: `Bearer ${authToken}` } });
expect(res.status).toBe(201);
expect(res.data).toMatchObject({ url: 'https://example.com' });
});

it('AC-002: GET /bookmarks returns user-scoped bookmarks', async () => {
const res = await api.get('/bookmarks', { headers: { Authorization: `Bearer ${authToken}` } });
expect(res.status).toBe(200);
expect(res.data.items).toBeDefined();
expect(res.data.total).toBeGreaterThanOrEqual(0);
});

it('AC-003: DELETE /bookmarks/:id returns 204', async () => {
const createRes = await api.post('/bookmarks', { url: 'https://delete-me.com' }, { headers: { Authorization: `Bearer ${authToken}` } });
const id = createRes.data.id;
const deleteRes = await api.delete(`/bookmarks/${id}`, { headers: { Authorization: `Bearer ${authToken}` } });
expect(deleteRes.status).toBe(204);
});

it('AC-004: POST with invalid URL returns 400', async () => {
const res = await api.post('/bookmarks', { url: 'not-a-url' }, { headers: { Authorization: `Bearer ${authToken}` } });
expect(res.status).toBe(400);
expect(res.data.error).toBeDefined();
});
});

Step 4: Write Unit Tests from Edge Cases

Create tests/unit/bookmark-validation.test.ts:

import { describe, it, expect } from 'vitest';
import { validateBookmarkInput } from '../../src/bookmarks/validation';

describe('validateBookmarkInput', () => {
it('rejects empty URL', () => {
expect(() => validateBookmarkInput({ url: '' })).toThrow();
});

it('rejects URL without protocol', () => {
expect(() => validateBookmarkInput({ url: 'example.com' })).toThrow();
});

it('rejects title exceeding 200 chars', () => {
expect(() => validateBookmarkInput({ url: 'https://example.com', title: 'a'.repeat(201) })).toThrow();
});

it('accepts valid URL and optional title', () => {
expect(validateBookmarkInput({ url: 'https://example.com' })).toBeDefined();
expect(validateBookmarkInput({ url: 'https://example.com', title: 'Example' })).toBeDefined();
});
});

Step 5: Write E2E Tests from User Journeys

Create tests/e2e/bookmarks.spec.ts (Playwright):

import { test, expect } from '@playwright/test';

test.describe('Bookmarks User Journey', () => {
test('Create, list, and delete bookmark', async ({ page }) => {
await page.goto('/login');
await page.fill('[name=email]', 'e2e@example.com');
await page.fill('[name=password]', 'password123');
await page.click('button[type=submit]');
await page.waitForURL('**/dashboard');

await page.click('text=Add Bookmark');
await page.fill('[name=url]', 'https://example.com/article');
await page.fill('[name=title]', 'Test Bookmark');
await page.click('button[type=submit]');

await expect(page.locator('text=Test Bookmark')).toBeVisible();

await page.click('text=Test Bookmark');
await page.click('button:has-text("Delete")');
await expect(page.locator('text=Test Bookmark')).not.toBeVisible();
});
});

Step 6: Run Tests (Red Phase)

Run the test suite:

npm run test:unit
npm run test:integration
npm run test:contract
npm run test:e2e

All tests fail—implementation does not exist yet. This is expected. The Red phase confirms that your tests correctly detect missing or incorrect behavior.

Step 7: Implement Until Green

Implement the bookmarks feature:

  1. Create validation module (validateBookmarkInput)
  2. Create API routes (POST, GET, DELETE)
  3. Create database layer
  4. Wire up authentication

Run tests after each step. When all pass, you have achieved Green.

Step 8: Establish Traceability

Add traceability comments to each test:

it('AC-001: POST /bookmarks creates bookmark and returns 201', async () => {
// Traceability: specs/005-bookmarks/spec.md AC-001
...
});

Maintain a traceability matrix (in specs/005-bookmarks/traceability.md or similar):

RequirementTest FileTest Name
AC-001bookmarks.test.tsAC-001: POST /bookmarks creates bookmark
AC-002bookmarks.test.tsAC-002: GET /bookmarks returns user-scoped
FR-004bookmark-validation.test.tsrejects URL without protocol
.........

When the spec changes, the matrix tells you exactly which tests to update.


Traceability means: for every requirement in the spec, there is at least one test; for every test, there is a requirement it verifies. No orphan tests. No untested requirements.

Traceability Format

In test code:

// @spec specs/005-bookmarks/spec.md
// @requirement AC-001
it('returns 201 when creating bookmark', ...);

In spec:

- AC-001: Given authenticated user, when POST /bookmarks with valid URL, then 201 with bookmark
- Test: tests/integration/bookmarks.test.ts::AC-001

Benefits

  • Change impact analysis: Update AC-002 → run tests for AC-002 → know if you broke something
  • Coverage reporting: "87% of requirements have passing tests" (not "87% line coverage")
  • Audit trail: Compliance and certification often require requirement-to-test mapping

Test Coverage as Specification Coverage

Traditional coverage measures code: lines, branches, functions. Spec-driven coverage measures requirements: how many acceptance criteria, edge cases, and user journeys have passing tests.

Specification Coverage Formula

Specification Coverage = (Requirements with ≥1 passing test) / (Total requirements) × 100%

Example

  • Total requirements: 15 (5 FR, 4 AC, 5 edge cases, 1 journey)
  • Requirements with passing tests: 14
  • Specification coverage: 93.3%

A requirement is "covered" if at least one test that traces to it passes. You can have 100% spec coverage with 50% code coverage—if the critical paths are tested. Or 80% code coverage with 60% spec coverage—if you're testing the wrong things.

Goal: 100% specification coverage for the features you ship.


Tools: Vitest, Playwright, Jest, Pytest, Cucumber

Vitest (JavaScript/TypeScript)

  • Fast, Vite-native test runner
  • Compatible with Jest API
  • Good for unit and integration tests
  • Use with @vitest/coverage-v8 for coverage

Playwright (E2E)

  • Cross-browser E2E testing
  • Auto-wait, trace viewer, parallel execution
  • Use for user journey tests
  • Integrates with BDD (Gherkin) via plugins

Jest (JavaScript/TypeScript)

  • Mature, widely used
  • Snapshot testing, mocking
  • Alternative to Vitest for teams already on Jest

Pytest (Python)

  • De facto standard for Python testing
  • Fixtures, parametrize, markers
  • Use for unit and integration tests in Python projects

Cucumber (BDD)

  • Gherkin syntax: Feature, Scenario, Given/When/Then
  • Executable specifications—feature files are both docs and tests
  • Use when specs are written in Gherkin
  • Integrates with many languages (Ruby, Java, JS, Python)

Common Pitfalls in Spec-Driven Testing

Pitfall 1: Testing Implementation Instead of Behavior

Wrong: Testing that BookmarkRepository.save() is called with specific arguments.

Right: Testing that after POST /bookmarks, GET /bookmarks returns the created bookmark.

Behavior survives refactoring. Implementation details do not.

Pitfall 2: Orphan Tests

Wrong: Writing tests because "we should test this" without linking to a requirement.

Right: Every test traces to a spec element. If you cannot trace it, either add the requirement to the spec or remove the test.

Pitfall 3: Vague Acceptance Criteria

Wrong: "User can create a bookmark" (not testable—how do you verify?)

Right: "Given authenticated user, when POST /bookmarks with valid URL, then 201 with { id, url, createdAt }"

Vague criteria produce vague tests. Strengthen the spec first.

Pitfall 4: Over-Reliance on E2E

Wrong: 50 E2E tests covering every scenario. Slow, brittle, hard to maintain.

Right: E2E for critical user journeys only. Use integration tests for API flows, unit tests for logic.

Pitfall 5: Ignoring Contract Tests

Wrong: "Our integration tests hit the API, so we're good."

Right: Integration tests verify behavior; contract tests verify structure. A provider can return wrong types (e.g., string instead of number) and still "work" in a happy-path integration test. Contract tests catch schema drift.


Try With AI

Prompt 1: Test Derivation

"I have a feature specification at specs/005-bookmarks/spec.md. Parse it and generate a test plan: for each acceptance criterion, edge case, and user journey, specify (1) the test type (unit/integration/contract/e2e), (2) the test name, (3) the key assertion. Output as a markdown table."

Prompt 2: Test Implementation

"Using the test plan from [previous output], implement the integration tests for the acceptance criteria. Use Vitest and a test helper that creates an authenticated user. Include traceability comments linking each test to the spec requirement (AC-001, AC-002, etc.)."

Prompt 3: Traceability Matrix

"I have tests in tests/integration/bookmarks.test.ts and tests/unit/bookmark-validation.test.ts. Extract all @requirement or @spec comments (or similar traceability). Generate a traceability matrix: Requirement ID | Test File | Test Name. Identify any requirements in specs/005-bookmarks/spec.md that have no linked tests."

Prompt 4: Red-Green Workflow

"I'm in the Red phase: I have failing tests for the bookmarks feature. The spec is at specs/005-bookmarks/spec.md. Generate the minimal implementation to make the integration tests pass. Start with validation, then the API routes. Do not add features not in the spec."


Practice Exercises

Exercise 1: Derive Tests from a Spec

Take the "Export data as CSV" specification from Chapter 19 (or create a minimal version). Derive a test plan: list each testable requirement, its test type, and the test name. Do not write code—only the plan. Ensure every requirement has at least one test.

Expected outcome: A test plan table with 10+ test cases covering all requirements.

Exercise 2: Implement Spec-Driven Tests

Choose one acceptance criterion from a spec you have. Write the integration test first (Red). Then implement the minimal code to pass (Green). Document the traceability. Reflect: did test-first change how you implemented?

Expected outcome: A passing test with traceability, and a brief reflection (1 paragraph).

Exercise 3: Specification Coverage Report

For a small project (or a subset of a larger one), create a specification coverage report. List all requirements. For each, identify the test(s) that cover it. Calculate coverage %. Identify gaps. Propose tests to close the gaps.

Expected outcome: A coverage report with percentage and a list of proposed tests for uncovered requirements.


Key Takeaways

  1. Tests derive from specifications, not implementation. The spec is the oracle. Tests verify that implementation satisfies the spec. This decouples tests from implementation and makes refactoring safer.

  2. Four test types map to spec elements: unit (edge cases, constraints), integration (acceptance criteria), contract (API schemas), e2e (user journeys). Use the right type for each requirement.

  3. Test-first in SDD (Article III): Write tests from the spec before implementation. Red → Green → Refactor. For AI-generated code, tests are the acceptance criteria.

  4. Traceability links every test to a requirement and every requirement to tests. Use comments, matrices, or tooling. Traceability enables change impact analysis and specification coverage reporting.

  5. Specification coverage measures requirements with passing tests, not lines of code. Aim for 100% spec coverage for shipped features.

  6. The testing pyramid applies: many unit/contract tests, moderate integration tests, few e2e tests. All derived from the specification.


Chapter Quiz

  1. What is the fundamental principle of spec-driven testing? How does it differ from testing that follows implementation?

  2. For each of these spec elements, which test type would you use: (a) "Email must be valid format," (b) "Given user logs in, when they request profile, then 200 with user data," (c) "GET /users returns array of User objects," (d) "User completes signup → verification → login"?

  3. What is Article III of the SDD constitution, and why does it matter for AI-generated code?

  4. In the Red-Green-Refactor loop, what is the source of truth for the test? (The implementation, the spec, or the developer's intuition?)

  5. What is specification coverage, and how does it differ from code coverage? Which should you prioritize in SDD?

  6. What is traceability in the context of spec-driven testing? Name two benefits.

  7. You have a spec with 10 acceptance criteria, 5 edge cases, and 1 user journey. How would you distribute tests across unit, integration, contract, and e2e? (Approximate counts.)

  8. Name three tools for spec-driven testing (from different layers: unit, integration, e2e) and when you would use each.