TL;DR

AI agents fabricate when four structural conditions exist: no tool access, combined retrieval and synthesis, no required source field, and bleed-over creativity temperature. Fix the structure — not the model — and fabrication drops sharply. The rules, the reviewer, and the weekly audit described below are the ones I am putting in place after catching too many invented numbers in draft output.

A few months into running an AI-native operation, I noticed something uncomfortable: the agents were making things up.

Not lying — inventing. A content agent would produce a post claiming a customer “saw a 340% lift.” I'd ask where the number came from. There was no source. It just sounded right, so the model wrote it.

If I hadn't caught it, that number would have gone out on social, in a press release, in an email to investors. The credibility cost of even one fabricated stat escaping into the wild is brutal. So I paused and started rebuilding how the team handles facts.

Here's what I found, and the system I'm putting in place.

The four ways AI teams hallucinate

01Generating from memory instead of checking

When an agent is asked a factual question — what's our MRR? — and has no tool available to look it up, it will often still produce a plausible-sounding answer. The model is trained to be helpful. “I don't know” feels like failure. So it guesses, and guesses sound like facts once they're in a paragraph.

02Combining retrieval and synthesis in one agent

If the same agent fetches the data and writes the story, it will paper over gaps. A missing number becomes an estimate. An estimate becomes a round figure. A round figure becomes a quote. By the time it's on the page, nobody — including the agent — can tell which parts came from real data.

03No source field, no accountability

If your output format doesn't require a source for every claim, you've silently made sourcing optional. The model will skip it. Every time.

04Creativity temperature bleeding into factual output

Teams tune models for “good writing” with higher temperature settings, then reuse the same config for stats, pricing, and product copy. Creativity and accuracy pull in opposite directions. You have to split them.

The system

A short protocol every agent follows, enforced with a reviewer that runs before anything gets published.

Every numeric claim, stat, date, quote, or factual assertion in any structured output must carry a source field. Accepted values: a file path on our VPS, a URL, a named query (e.g. stripe-api:charges.list:2026-04-12), or a human source with a date.

If the agent can't fill it truthfully, the claim gets removed — or replaced with unverified — needs research and escalated.

Three supporting rules come with it:

The red-team reviewer

Rules alone don't enforce themselves. The reviewer sits between every public output and the publish step.

It's a single Node script that calls a fast model with one job: list every claim in the content, flag any without an obvious source, and return a JSON array with severity levels. Wired into the publishing pipeline so anything with a blocker-severity finding can't ship.

redteam-review.js
$ echo "QFHQ ships 400 products with 49 agents. MRR grew 340% last quarter." \
  | node redteam-review.js --stdin

{
  "blockers": 3,
  "findings": [
    { "claim": "ships 400 products", "severity": "blocker",
      "action": "cite source" },
    { "claim": "49 agents", "severity": "blocker",
      "action": "cite source" },
    { "claim": "MRR grew 340% last quarter", "severity": "blocker",
      "action": "cite source" }
  ]
}
# exit code 1 — blocks publish

The reviewer logs every finding to a daily JSONL file. A weekly cron rolls those up into an audit report — outputs reviewed, blockers caught before they shipped, rolling hallucination rate.

What changes

Two things. First, the obvious one: fabricated stats get caught in draft before they reach a public surface.

Second, less obvious: the agents start writing differently. Given a rule that every claim needs a source, they reach for tools earlier and write more carefully. The rule shapes the behavior, not just the output.

The fastest way to reduce hallucination isn't a smarter model. It's a structural requirement that makes fabrication harder than retrieval.

What I'd tell anyone running an AI team

This system isn't final. Next: tighter red-team prompts, provenance chains across agent handoffs, and eventually pushing the source-field requirement down into the tool-call layer itself. The bones are in place. Published output is getting noticeably cleaner as each layer lands.

If you're building with AI agents and haven't put guardrails on factual output yet, start there. It's the highest-leverage work you can do.

AI Agents Operations Hallucination Engineering QuantForge