Skip to main content

Chapter 29: AI Governance and CI/CD Integration


Learning Objectives

By the end of this chapter, you will be able to:

  • Define AI governance and explain why it is essential for AI-generated code
  • Implement governance rules: mandatory code review, security scanning, and coverage thresholds
  • Add LLM-specific security controls such as prompt-injection and tool-allowlist checks
  • Design a spec-validated CI/CD pipeline with explicit gates
  • Configure spec validation so incomplete or drifting artifacts fail the pipeline
  • Define review policies for low-, medium-, and high-risk AI-generated changes
  • Preserve traceability from specification to deployed code

What Is AI Governance?

AI governance is the set of policies, processes, and controls that ensure AI-generated code remains safe, compliant, and high-quality. When AI writes code, you lose the implicit trust that comes from human authorship. Governance restores that trust through verification.

Without governance, AI-generated code can introduce:

  • Security vulnerabilities: hardcoded secrets, SQL injection, insecure dependencies
  • Compliance violations: missing audit trails, incorrect data handling, weak approvals
  • Quality regressions: untested code paths, broken contracts, performance degradation
  • Spec drift: code that diverges from specifications without detection

AI governance answers a simple question: How do we ensure that what AI produces is fit for production?

Core Principles

  1. Never trust, always verify: AI output is treated as untrusted until it passes the required gates.
  2. Specification as contract: the specification is the source of truth; code must conform.
  3. Automated gates first: automate what can be verified mechanically; reserve humans for judgment and exceptions.
  4. Traceability: every deployed artifact should link back to the spec, commit, review, and test results.
  5. Auditability: approvals, overrides, and failures must be reviewable later.

Governance Rules

Governance rules are the concrete policies that enforce quality. They define what must pass before code reaches production.

Mandatory Code Review

Rule: All medium- and high-risk AI-generated code requires human review before merge.

Implementation:

  • Require at least one approval on pull requests
  • Require the reviewer to compare the change to the feature spec
  • Require two approvals for high-risk paths such as auth, payments, or PII handling

Security Scanning

Rule: All code must pass security scans before deployment.

Common checks:

  • SAST on source code
  • Dependency audit for known CVEs
  • DAST on release candidates or staging where appropriate

LLM-Specific Security Controls

Traditional AppSec scanning is necessary but insufficient for agentic systems. Add explicit controls for:

  • prompt injection
  • tool misuse / excessive agency
  • data exfiltration
  • untrusted retrieval poisoning

Example policy:

llm-security:
prompt-injection-tests: required
tool-allowlist: required
network-egress-policy: restricted
pii-redaction-before-model: required
retrieval-source-trust-levels: required

Coverage Thresholds

Rule: Coverage thresholds should be risk-tiered. Use 90%+ for high-risk paths; lower thresholds may be acceptable for low-risk changes with strong contract and integration coverage.

coverage:
low-risk-minimum: 75
medium-risk-minimum: 85
high-risk-minimum: 90
scope: changed-files

Spec Validation

Rule: Code must conform to the specification. Spec violations block merge or deployment.

Common ways to enforce this:

  • contract tests
  • integration tests mapped to acceptance criteria
  • linting for required spec sections
  • drift detection between contracts and implementation

The Spec-Validated CI/CD Pipeline

The spec-validated pipeline extends traditional CI/CD with specification-centric gates.

Pipeline Overview

  1. Specification validation: lint specs and validate contracts
  2. Build and test: run contract, integration, and coverage checks
  3. Security scanning: SAST and dependency audit
  4. Performance validation: budget checks where they are deterministic
  5. Deployment: deploy only when all required gates pass

Each stage can block the next.


Spec Validation Gates

Spec validation gates are the mechanisms that block merge or deployment when code diverges from the specification.

GateWhat It ChecksFailure Action
Spec lintRequired sections and basic structureBlock PR
Contract testAPI matches OpenAPI or other contractBlock merge
Integration testAcceptance criteria passBlock merge
CoverageTouched code meets thresholdBlock merge
LLM security gatePolicy checks for agent behaviorBlock merge
Drift detectionCode and spec remain alignedAlert or block

Drift Detection

Drift occurs when code changes but the spec or contract does not. Detection strategies:

  1. Contract tests
  2. Schema comparison
  3. Requirement-to-test traceability
  4. Human review against the spec

AI Output Review Policies

When is human review required? When can automated approval suffice?

Change TypeRiskHuman ReviewAutomated Approval
Typo fix, docsLowOptionalYes, if checks pass
Small bug fixMediumRequiredUsually no
New feature from specMediumRequiredNo
Security-relatedHighRequired (2 approvals)Never
Dependency updateMediumRequired for majorsPatches may be automated if policy allows

High-Risk Paths

Always require human review for:

  • **/auth/**
  • **/payments/**
  • **/user-data/**
  • **/config/production*

Compliance Considerations

Enterprises often need audit trails and traceability. SDD supports this well if you record:

  • specification versions and changes
  • code generation events
  • review decisions
  • deployment events
  • test and security results

Example Traceability Record

{
"deployment_id": "dep-2026-03-11-001",
"timestamp": "2026-03-11T14:32:00Z",
"spec_version": "005-bookmarks-v1.2",
"spec_path": "specs/005-bookmarks/spec.md",
"commit": "a1b2c3d",
"pr": "123",
"reviewer": "jane@example.com",
"tests_passed": true,
"security_scan": "passed",
"coverage": 94
}

Tutorial: Build a Spec-Validated CI/CD Pipeline with GitHub Actions

This tutorial walks you through building a spec-validated CI/CD pipeline using GitHub Actions. The workflow below is example scaffolding: it is realistic enough to adapt, but you should still tailor the scripts, thresholds, and deployment steps to your stack.

Prerequisites

  • A GitHub repository with:
    • specifications in specs/
    • an OpenAPI contract in specs/005-bookmarks/contracts/openapi.yaml
    • source code and tests
  • GitHub Actions enabled
  • A Node-based project layout for the example scripts below

Step 1: Add Validation Scripts

Create scripts/spec-lint.mjs:

import fs from "node:fs";
import path from "node:path";

const root = path.resolve("specs");
const requiredSections = ["## Requirements", "## Acceptance Criteria"];
let failures = 0;

function walk(dir) {
for (const entry of fs.readdirSync(dir, { withFileTypes: true })) {
const fullPath = path.join(dir, entry.name);
if (entry.isDirectory()) walk(fullPath);
if (entry.isFile() && entry.name === "spec.md") {
const content = fs.readFileSync(fullPath, "utf8");
for (const section of requiredSections) {
if (!content.includes(section)) {
console.error(`${fullPath} is missing section: ${section}`);
failures += 1;
}
}
}
}
}

if (fs.existsSync(root)) walk(root);
if (failures > 0) process.exit(1);
console.log("Spec lint passed.");

Create scripts/check-coverage.mjs:

import fs from "node:fs";

const threshold = 90;
const summary = JSON.parse(
fs.readFileSync("coverage/coverage-summary.json", "utf8")
);
const lines = summary.total.lines.pct;

if (lines < threshold) {
console.error(`Coverage ${lines}% is below ${threshold}%`);
process.exit(1);
}

console.log(`Coverage ${lines}% meets threshold.`);

These scripts avoid brittle shell globs and extra dependencies such as jq or bc.

Step 2: Create the Workflow File

Create .github/workflows/spec-validated-pipeline.yml:

name: Spec-Validated Pipeline

on:
push:
branches: [main, develop]
pull_request:
branches: [main, develop]

env:
NODE_VERSION: '20'

jobs:
spec-validation:
name: Spec Validation
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Setup Node
uses: actions/setup-node@v4
with:
node-version: ${{ env.NODE_VERSION }}
cache: 'npm'

- name: Install dependencies
run: npm ci

- name: Lint specifications
run: node scripts/spec-lint.mjs

- name: Validate OpenAPI contracts
run: |
find specs -path "*/contracts/*.yaml" -o -path "*/contracts/*.yml" | while read contract; do
echo "Validating $contract"
npx @redocly/cli lint "$contract"
done

contract-tests:
name: Contract Tests
needs: spec-validation
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Setup Node
uses: actions/setup-node@v4
with:
node-version: ${{ env.NODE_VERSION }}
cache: 'npm'

- name: Install dependencies
run: npm ci

- name: Run contract tests
run: npm run test:contract

- name: Run integration tests
run: npm run test:integration

- name: Run coverage
run: npm run test:coverage

- name: Check coverage threshold
run: node scripts/check-coverage.mjs

security-scan:
name: Security Scan
needs: spec-validation
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Setup Node
uses: actions/setup-node@v4
with:
node-version: ${{ env.NODE_VERSION }}
cache: 'npm'

- name: Install dependencies
run: npm ci

- name: Run npm audit
run: npm audit --audit-level=high

- name: Run Semgrep
uses: semgrep/semgrep-action@v1
with:
config: p/security-audit

performance-budget:
name: Performance Budget
needs: [contract-tests, security-scan]
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Setup Node
uses: actions/setup-node@v4
with:
node-version: ${{ env.NODE_VERSION }}
cache: 'npm'

- name: Install dependencies
run: npm ci

- name: Build
run: npm run build

- name: Check bundle size
run: |
if [ -d "dist" ]; then
SIZE=$(du -sb dist | cut -f1)
MAX_SIZE=524288
if [ "$SIZE" -gt "$MAX_SIZE" ]; then
echo "Bundle size $SIZE exceeds $MAX_SIZE bytes"
exit 1
fi
fi

deploy:
name: Deploy
needs: [contract-tests, security-scan, performance-budget]
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main' && github.event_name == 'push'
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Get spec version
id: spec
run: |
SPEC_VERSION=$(git describe --tags --always)
echo "version=$SPEC_VERSION" >> $GITHUB_OUTPUT

- name: Create deployment record
run: |
echo '{"deployment":"'${{ github.run_id }}'","spec":"'${{ steps.spec.outputs.version }}'","commit":"'${{ github.sha }}'"}' > deployment-record.json

- name: Deploy to staging
run: echo "Deploying with spec tag ${{ steps.spec.outputs.version }}"

Expected result: Pull requests fail if specs are malformed, contracts are invalid, tests fail, or coverage drops below threshold.

Step 3: Configure package.json Scripts

Ensure package.json contains:

{
"scripts": {
"test:contract": "vitest run tests/contract",
"test:integration": "vitest run tests/integration",
"test:coverage": "vitest run --coverage",
"spec:lint": "node scripts/spec-lint.mjs",
"coverage:check": "node scripts/check-coverage.mjs"
}
}

Step 4: Add Branch Protection

In GitHub:

  1. Open Settings -> Branches
  2. Add a rule for main
  3. Require status checks:
    • spec-validation
    • contract-tests
    • security-scan
    • performance-budget
  4. Require pull request review before merging
  5. Require additional approvals for high-risk paths if needed

Step 5: Add the Missing Checks Carefully

Only add the following when they are deterministic in your environment:

  • API latency checks
  • DAST scans
  • Snyk or other secret-dependent tooling
  • production deployment approvals

If a check needs seed data, service orchestration, or credentials, document that explicitly instead of pretending it is universally copy-paste runnable.


Governance Frameworks by Team Size

Governance should scale with team size and risk.

Startup (1-10 engineers)

Focus: speed with safety.

  • Spec lint + contract tests in CI
  • 1 mandatory review
  • Basic dependency audit
  • Coverage target often lower for low-risk paths

Mid-Size (10-50 engineers)

Focus: consistency and quality.

  • Full pipeline
  • 1-2 approvals depending on path
  • SAST + dependency audit
  • Stronger coverage requirements
  • Basic traceability in release notes or deployment records

Enterprise (50+ engineers)

Focus: compliance, auditability, and risk management.

  • Full pipeline plus DAST and manual production approvals where required
  • 2 approvals for high-risk paths
  • Full audit trail retention
  • Dedicated governance ownership

Try With AI

Prompt 1: Governance Policy Design

"Our team is adopting SDD. We have 15 engineers and handle user PII. Design an AI output review policy: when is human review required vs. automated approval? Include a table of change types and risk levels. Suggest high-risk paths that always need review."

Prompt 2: Pipeline Gate Configuration

"I have a GitHub Actions workflow. Add a spec validation stage that: (1) lints markdown specs in specs/ for required sections, (2) validates OpenAPI files in contracts/. Show the YAML and any helper scripts needed."

Prompt 3: Traceability Implementation

"We need an audit trail for compliance. Design a JSON schema for deployment records that links deployment ID, timestamp, spec version, spec path, commit SHA, PR number, reviewer, test results, and security scan result."

Prompt 4: Drift Detection Strategy

"How can we detect when code diverges from the specification without the spec being updated? List 3-4 strategies and explain what each one catches and what it might miss."


Practice Exercises

Exercise 1: Add a Spec Validation Gate

Take an existing project with specs. Add a CI job that:

  1. Lints specs for required sections
  2. Validates OpenAPI contracts
  3. Fails the pipeline if either fails

Expected outcome: A working spec validation gate in CI.

Exercise 2: Define Your Review Policy

Create an AI output review policy for your team or a hypothetical team. Include:

  • change types and risk levels
  • when human review is required
  • high-risk paths
  • automated approval criteria

Expected outcome: A 1-2 page policy document.

Exercise 3: Implement a Coverage Gate

Add a coverage threshold check to your CI pipeline. If coverage drops below your target for the risk tier you are enforcing, fail the build.

Expected outcome: CI fails when coverage drops below threshold.


Key Takeaways

  1. AI governance ensures AI-generated code is safe, compliant, and high-quality through policies, gates, and verification.
  2. Governance rules should cover review, security, coverage, and spec validation.
  3. The spec-validated CI/CD pipeline extends traditional CI/CD with specification-aware checks.
  4. Spec validation gates help catch drift before deployment.
  5. Compliance depends on audit trails and traceability, not just test results.
  6. Governance should scale with context: startups and enterprises should not use the exact same process.

Chapter Quiz

  1. What is AI governance, and why is it essential when using AI-generated code?
  2. What are the main stages in a spec-validated CI/CD pipeline?
  3. When should human review be required for AI-generated code?
  4. What is drift detection? Name two ways to implement it.
  5. What should an audit trail include?
  6. Why is it dangerous to present a CI example as universally runnable when it depends on hidden credentials or setup?

Back to: Part X Overview | Next: Chapter 30 - Metrics and Engineering Roles