核心 · Key Idea
In one line: Reflection = make the model audit its own answer: "Is this right? Where might it be wrong?" If not, rewrite. The cheapest way to turn "one-shot inference" into a "write → critique → rewrite" iterative loop.
What it is#
Round 1:
Question: write SQL for top-5 cities by Q3 sales
Answer: SELECT city, SUM(amount)... ORDER BY amount LIMIT 5
Round 2 (Reflection):
Critique: this SQL is missing GROUP BY — it will error.
Round 3:
Revise: SELECT city, SUM(amount) ... GROUP BY city ORDER BY ...
The same model is its own marker. One extra call typically halves the error rate.
Analogy#
打个比方 · Analogy
When humans write, we read it back once: first pass on instinct, second pass to find issues, third pass to fix.
Reflection = bake this into the prompt — have the model code-review its own output.
Key concepts#
Self-CritiqueSelf-critique
Prompt the model to find flaws in its own output: 'where might this be wrong?'
VerifierVerifier
Same model, a stronger model, or an external tool (compiler / tests).
IterationIteration
'Write → critique → revise' can loop N rounds — usually 1–3 is enough.
ReflexionReflexion algorithm
Write past failure causes into 'memory' for future decisions. Reflection-on-steroids.
How it works#
Each arrow is one LLM call — overhead is bounded.
Practical notes#
- Make the self-eval prompt specific. "List the 3 most likely errors, then decide whether to revise" beats "check this."
- Prefer objective verifiers. Anything that can run tests / a compiler / a calculator should — model self-eval has a "self-flattery" bias.
- Cap iterations. 1–3 rounds usually plateau; more rounds risks making it worse.
- Reflection often beats sampling-N-and-vote at the same token budget.
- Always-on for high-stakes work. Writing code, math, SQL, long-doc summaries — errors are costly, the extra round pays for itself.
Easy confusions#
Reflection
**Critique + rewrite.**
Sequential rounds, each builds on the last.
Sequential rounds, each builds on the last.
Self-Consistency
**Independent samples + vote.**
Parallel, never interact.
Parallel, never interact.
Reflection
Model **judges itself**.
Cheap but may miss issues.
Cheap but may miss issues.
Tool-based verify
Use a **compiler / test / calculator**.
Objective and reliable — use it whenever possible.
Objective and reliable — use it whenever possible.