📘 CASE STUDY — Part VII: Production RAG Assistant (Eval, Drift, Safety)
SECTION 0 — SCENARIO
You’re building a support assistant that answers questions from internal docs.
It must:
-
cite sources
-
avoid hallucinations
-
handle doc drift
-
be measurable
SECTION 1 — ARCHITECTURE
RAG pipeline:
-
ingest docs → chunk → embed → index
-
query → retrieve top-k → compose prompt
-
generate answer + citations
SECTION 2 — EVALUATION (BEFORE SHIPPING)
Create a golden set:
-
50–200 representative questions
-
expected key points
Measure:
-
groundedness (is answer supported?)
-
retrieval quality (did we fetch the right chunks?)
-
refusal correctness (does it say “I don’t know”?)
SECTION 3 — SAFETY + GUARDRAILS
-
restrict tools/actions
-
block prompt injection patterns
-
strip untrusted HTML
-
enforce “cite or refuse” policy
SECTION 4 — DRIFT + MONITORING
Monitor:
-
retrieval miss rate
-
citation coverage
-
user feedback
-
doc freshness lag
Re-embed on:
-
doc changes
-
model changes
SECTION 5 — ROLLOUT
-
start internal-only
-
add “report issue” on every answer
-
feature flag + canary
SECTION 6 — EXERCISE
Design a “hallucination incident” runbook:
-
detection
-
mitigation
-
root cause (retrieval vs prompt vs model)