In one line: Hallucination is when an LLM invents content that sounds plausible but is factually wrong. It is not a bug — it's the natural side-effect of "maximise the next-token probability." The model does not know what it does not know.
What it is#
An LLM has no "fact database." It just predicts what the next word most likely looks like, given training data. So you get:
- Fabricated citations: "Smith 2021 proved that..." — the paper doesn't exist.
- Fabricated APIs: "
requests.fetch(url)" — that method doesn't exist in therequestslibrary. - Garbled details: people, years, and numbers get mixed up while the grammar stays flawless.
Analogy#
The LLM is a gifted improv actor: ask it to play "encyclopedia professor" and it will not say "I don't know" — it will assemble a plausible-sounding answer from feel — convincing on the surface, but part fact, part fiction.
Key concepts#
Why hallucination happens#
The core tension: the loss function rewards "looks like training data," not "is factually correct."
Practical notes (how to suppress it)#
- Plug in RAG. Let the model answer from real documents — RAG is the industry-standard answer.
- Hard prompt constraints. "Use only material I provide; if it isn't there, say 'I don't know'" alone slashes hallucinations.
- Demand provenance. Make the model cite a snippet ID or page after every claim, and refuse to answer without a citation.
- Lower temperature. For factual tasks, set Temperature to 0–0.3 to reduce improvisation.
- Verify with tools. For numbers, dates, SQL results, etc., let Code Interpreter do the math live.
- Two-stage self-check. Generate the answer, then have the model "check the previous reply for unsupported claims" — see Reflection.
Easy confusions#
Usually highly "confident" — no uncertainty signal.
Can be patched with RAG / web search.
Never trust LLM output blindly for high-stakes decisions. Either human-review or RAG + strict citations + refuse-to-answer.
Further reading#
- RAG — the standard answer to crushing hallucinations
- Temperature & Top-P — sampling-side fixes for "making things up"
- Reflection — let the model self-check