Temperature & Top-P

核心 · Key Idea

In one line: Temperature controls how bold the model is (higher = more random); Top-P controls how big the candidate pool it samples from is (smaller = only the safest few). Both let you slide between rigorous & deterministic ↔ flexible & creative.

What it is#

For every Token it generates, the model computes a probability distribution over the vocabulary, then samples:

Token candidate    Probability
"clear"            0.42
"sunny"            0.28
"cool"             0.10
"…"                0.20

Temperature changes the sharpness of this distribution; Top-P changes how big the pool of candidates is.

Analogy#

打个比方 · Analogy

Temperature = the conductor's "variation knob." 0 = recite the score verbatim; 1 = improvised jazz; >1 = drifting off-key.
Top-P = "only the menu's top-N best-sellers." Top-P 0.1 = consider only the two or three most popular dishes; 0.95 = almost everything is fair game.

Key concepts#

Temperature 0Deterministic

Always picks the highest-probability Token. Same input → same output, every time.

Temperature 0.7Default creative

OpenAI / most chat APIs' default. Natural with reasonable variation.

Temperature 1.5+Nonsense zone

High randomness; risks tangents, off-topic jumps, even misspellings.

Top-P 0.1Conservative sampling

Only picks from candidates whose cumulative probability hits 10%. Nearly deterministic.

Top-P 1.0Wide open

The whole vocab is in play. Combined with high temperature, freewheeling output.

How to combine them#

Empirical defaults:

Task	Temperature	Top-P
Extraction / SQL / JSON	0 – 0.2	0.5
RAG Q&A / translation	0.2 – 0.4	0.7
Copywriting / prose	0.6 – 0.8	0.9
Brainstorming / story	0.9 – 1.2	0.95

Practical notes#

One knob is enough. In practice, just adjust temperature. Leave Top-P at 1.0 unless you need extreme determinism.
Cooler for rigorous tasks. Entity extraction, schema generation, SQL — T=0 is the most reproducible.
High temperature needs guardrails. At T=1, pin the length and structure in the system prompt, otherwise the output will drift.
Doesn't affect RAG quality. Temperature only changes sampling — not retrieval. Bad RAG is almost always bad prompts / chunking, not temperature.

Easy confusions#

Temperature

Changes the **sharpness of the whole distribution**.
Every candidate stays in the running; weaker candidates' odds shift.

Top-P

Changes the **size of the candidate pool**.
Tokens outside the pool are forcibly set to probability zero.

Tweaking both?

Both compress randomness. Usually only change one so you don't end up wondering "which knob is at what value?"