Reflection (self-critique & correction)

核心 · Key Idea

In one line: Reflection = make the model audit its own answer: "Is this right? Where might it be wrong?" If not, rewrite. The cheapest way to turn "one-shot inference" into a "write → critique → rewrite" iterative loop.

What it is#

Round 1:
  Question: write SQL for top-5 cities by Q3 sales
  Answer: SELECT city, SUM(amount)... ORDER BY amount LIMIT 5

Round 2 (Reflection):
  Critique: this SQL is missing GROUP BY — it will error.
  
Round 3:
  Revise: SELECT city, SUM(amount) ... GROUP BY city ORDER BY ...

The same model is its own marker. One extra call typically halves the error rate.

Analogy#

打个比方 · Analogy

When humans write, we read it back once: first pass on instinct, second pass to find issues, third pass to fix.
Reflection = bake this into the prompt — have the model code-review its own output.

Key concepts#

Self-CritiqueSelf-critique

Prompt the model to find flaws in its own output: 'where might this be wrong?'

VerifierVerifier

Same model, a stronger model, or an external tool (compiler / tests).

IterationIteration

'Write → critique → revise' can loop N rounds — usually 1–3 is enough.

ReflexionReflexion algorithm

Write past failure causes into 'memory' for future decisions. Reflection-on-steroids.

How it works#

Each arrow is one LLM call — overhead is bounded.

Practical notes#

Make the self-eval prompt specific. "List the 3 most likely errors, then decide whether to revise" beats "check this."
Prefer objective verifiers. Anything that can run tests / a compiler / a calculator should — model self-eval has a "self-flattery" bias.
Cap iterations. 1–3 rounds usually plateau; more rounds risks making it worse.
Reflection often beats sampling-N-and-vote at the same token budget.
Always-on for high-stakes work. Writing code, math, SQL, long-doc summaries — errors are costly, the extra round pays for itself.

Easy confusions#

Reflection

**Critique + rewrite.**
Sequential rounds, each builds on the last.

Self-Consistency

**Independent samples + vote.**
Parallel, never interact.

Reflection

Model **judges itself**.
Cheap but may miss issues.

Tool-based verify

Use a **compiler / test / calculator**.
Objective and reliable — use it whenever possible.