AI agents fabricate when four structural conditions exist: no tool access, combined retrieval and synthesis, no required source field, and bleed-over creativity temperature. Fix the structure — not the model — and fabrication drops sharply. The rules, the reviewer, and the weekly audit described below are the ones I am putting in place after catching too many invented numbers in draft output.
A few months into running an AI-native operation, I noticed something uncomfortable: the agents were making things up.
Not lying — inventing. A content agent would produce a post claiming a customer “saw a 340% lift.” I'd ask where the number came from. There was no source. It just sounded right, so the model wrote it.
If I hadn't caught it, that number would have gone out on social, in a press release, in an email to investors. The credibility cost of even one fabricated stat escaping into the wild is brutal. So I paused and started rebuilding how the team handles facts.
Here's what I found, and the system I'm putting in place.
The four ways AI teams hallucinate
01Generating from memory instead of checking
When an agent is asked a factual question — what's our MRR? — and has no tool available to look it up, it will often still produce a plausible-sounding answer. The model is trained to be helpful. “I don't know” feels like failure. So it guesses, and guesses sound like facts once they're in a paragraph.
02Combining retrieval and synthesis in one agent
If the same agent fetches the data and writes the story, it will paper over gaps. A missing number becomes an estimate. An estimate becomes a round figure. A round figure becomes a quote. By the time it's on the page, nobody — including the agent — can tell which parts came from real data.
03No source field, no accountability
If your output format doesn't require a source for every claim, you've silently made sourcing optional. The model will skip it. Every time.
04Creativity temperature bleeding into factual output
Teams tune models for “good writing” with higher temperature settings, then reuse the same config for stats, pricing, and product copy. Creativity and accuracy pull in opposite directions. You have to split them.
The system
A short protocol every agent follows, enforced with a reviewer that runs before anything gets published.
Every numeric claim, stat, date, quote, or factual assertion in any structured output must carry a source field. Accepted values: a file path on our VPS, a URL, a named query (e.g. stripe-api:charges.list:2026-04-12), or a human source with a date.
If the agent can't fill it truthfully, the claim gets removed — or replaced with unverified — needs research and escalated.
Three supporting rules come with it:
- Verify-or-abstain. If an agent doesn't have a tool result, it responds “I need to verify” and invokes the appropriate tool. No estimating from training data for production-bound claims.
- Retrieval / synthesis separation. Research agents fetch facts and hand over structured data. Writing agents only write from what was handed over. If a fact isn't in the handoff, it doesn't appear in the output.
- Temperature discipline. Factual outputs run at temperature 0. Creative surfaces can go up to 0.7. Never both in the same call.
The red-team reviewer
Rules alone don't enforce themselves. The reviewer sits between every public output and the publish step.
It's a single Node script that calls a fast model with one job: list every claim in the content, flag any without an obvious source, and return a JSON array with severity levels. Wired into the publishing pipeline so anything with a blocker-severity finding can't ship.
$ echo "QFHQ ships 400 products with 49 agents. MRR grew 340% last quarter." \ | node redteam-review.js --stdin { "blockers": 3, "findings": [ { "claim": "ships 400 products", "severity": "blocker", "action": "cite source" }, { "claim": "49 agents", "severity": "blocker", "action": "cite source" }, { "claim": "MRR grew 340% last quarter", "severity": "blocker", "action": "cite source" } ] } # exit code 1 — blocks publish
The reviewer logs every finding to a daily JSONL file. A weekly cron rolls those up into an audit report — outputs reviewed, blockers caught before they shipped, rolling hallucination rate.
What changes
Two things. First, the obvious one: fabricated stats get caught in draft before they reach a public surface.
Second, less obvious: the agents start writing differently. Given a rule that every claim needs a source, they reach for tools earlier and write more carefully. The rule shapes the behavior, not just the output.
The fastest way to reduce hallucination isn't a smarter model. It's a structural requirement that makes fabrication harder than retrieval.
What I'd tell anyone running an AI team
- Make the source field mandatory in your output schema. This alone eliminates most low-effort fabrication.
- Put retrieval and writing in different agents. Don't let one model both find and frame.
- Run a cheap reviewer on anything public. A small model with a narrow job beats a big model with a vague one.
- Log findings. Audit weekly. Publish the rate internally. Agents behave differently when they know they're measured.
- Accept that “I need to verify” is a better answer than a confident wrong one. Train your team — human and AI — that the shame is in the fabrication, not in the pause.
This system isn't final. Next: tighter red-team prompts, provenance chains across agent handoffs, and eventually pushing the source-field requirement down into the tool-call layer itself. The bones are in place. Published output is getting noticeably cleaner as each layer lands.
If you're building with AI agents and haven't put guardrails on factual output yet, start there. It's the highest-leverage work you can do.