Skip to main content

📘 CASE STUDY — Part VII: Production RAG Assistant (Eval, Drift, Safety)

SECTION 0 — SCENARIO

You’re building a support assistant that answers questions from internal docs.

It must:

  • cite sources

  • avoid hallucinations

  • handle doc drift

  • be measurable


SECTION 1 — ARCHITECTURE

RAG pipeline:

  • ingest docs → chunk → embed → index

  • query → retrieve top-k → compose prompt

  • generate answer + citations


SECTION 2 — EVALUATION (BEFORE SHIPPING)

Create a golden set:

  • 50–200 representative questions

  • expected key points

Measure:

  • groundedness (is answer supported?)

  • retrieval quality (did we fetch the right chunks?)

  • refusal correctness (does it say “I don’t know”?)


SECTION 3 — SAFETY + GUARDRAILS

  • restrict tools/actions

  • block prompt injection patterns

  • strip untrusted HTML

  • enforce “cite or refuse” policy


SECTION 4 — DRIFT + MONITORING

Monitor:

  • retrieval miss rate

  • citation coverage

  • user feedback

  • doc freshness lag

Re-embed on:

  • doc changes

  • model changes


SECTION 5 — ROLLOUT

  • start internal-only

  • add “report issue” on every answer

  • feature flag + canary


SECTION 6 — EXERCISE

Design a “hallucination incident” runbook:

  • detection

  • mitigation

  • root cause (retrieval vs prompt vs model)


🏁 END — PART VII CASE STUDY