ArcLibrary

Temperature & Top-P

The two core knobs that trade randomness, creativity, and rigour in an LLM's output.

PromptSampling
核心 · Key Idea

In one line: Temperature controls how bold the model is (higher = more random); Top-P controls how big the candidate pool it samples from is (smaller = only the safest few). Both let you slide between rigorous & deterministic ↔ flexible & creative.

What it is#

For every Token it generates, the model computes a probability distribution over the vocabulary, then samples:

Token candidate    Probability
"clear"            0.42
"sunny"            0.28
"cool"             0.10
"…"                0.20

Temperature changes the sharpness of this distribution; Top-P changes how big the pool of candidates is.

Analogy#

打个比方 · Analogy
  • Temperature = the conductor's "variation knob." 0 = recite the score verbatim; 1 = improvised jazz; >1 = drifting off-key.
  • Top-P = "only the menu's top-N best-sellers." Top-P 0.1 = consider only the two or three most popular dishes; 0.95 = almost everything is fair game.

Key concepts#

Temperature 0Deterministic
Always picks the highest-probability Token. Same input → same output, every time.
Temperature 0.7Default creative
OpenAI / most chat APIs' default. Natural with reasonable variation.
Temperature 1.5+Nonsense zone
High randomness; risks tangents, off-topic jumps, even misspellings.
Top-P 0.1Conservative sampling
Only picks from candidates whose cumulative probability hits 10%. Nearly deterministic.
Top-P 1.0Wide open
The whole vocab is in play. Combined with high temperature, freewheeling output.

How to combine them#

Empirical defaults:

TaskTemperatureTop-P
Extraction / SQL / JSON0 – 0.20.5
RAG Q&A / translation0.2 – 0.40.7
Copywriting / prose0.6 – 0.80.9
Brainstorming / story0.9 – 1.20.95

Practical notes#

  • One knob is enough. In practice, just adjust temperature. Leave Top-P at 1.0 unless you need extreme determinism.
  • Cooler for rigorous tasks. Entity extraction, schema generation, SQL — T=0 is the most reproducible.
  • High temperature needs guardrails. At T=1, pin the length and structure in the system prompt, otherwise the output will drift.
  • Doesn't affect RAG quality. Temperature only changes sampling — not retrieval. Bad RAG is almost always bad prompts / chunking, not temperature.

Easy confusions#

Temperature
Changes the **sharpness of the whole distribution**.
Every candidate stays in the running; weaker candidates' odds shift.
Top-P
Changes the **size of the candidate pool**.
Tokens outside the pool are forcibly set to probability zero.
Tweaking both?

Both compress randomness. Usually only change one so you don't end up wondering "which knob is at what value?"

Further reading#

  • System Prompt — constraints + low temperature = the most stable output
  • Hallucination — low temperature won't save you from fundamental fabrication
  • CoT — make reasoning steadier even at high temperature