ReAct (Reason + Act)

核心 · Key Idea

In one line: ReAct = Reason + Act. The model alternates three blocks of text — Thought → Action → Observation — looping until the task is done. Today ~90% of agent frameworks (LangChain / OpenAI tools / Claude tools) are variants of this pattern under the hood.

What it is#

Each round emits three blocks:

Thought: I need to know the weather in Beijing today; I should use the search tool.
Action:  search("Beijing weather today")
Observation: 26°C, cloudy.

Thought: The user asked whether they need an umbrella; I can answer now.
Action:  finish("No umbrella needed — cloudy today.")

Treat Thought / Action / Observation as a structured log generated by the model and looped back into the prompt — every round bases its decision on the previous Observation.

Analogy#

打个比方 · Analogy

ReAct is like detective work:

Reason = hypothesise ("the suspect might be A because of motive");
Act = collect evidence (verify A's alibi);
Observation = receive the evidence;
Then reason again — looping until the case is solved.

Key concepts#

ThoughtThought

The model thinks about the next step in natural language — CoT applied within a loop.

ActionAction

A structured tool call (Function Calling): which tool, what args.

ObservationObservation

The tool's return value, fed back to the model so it can continue.

Stop ConditionStop condition

Explicit `finish()` from the model or hitting the max-step cap.

How it works#

Each loop splices the entire history (all Thought / Action / Observation) back into the prompt — which is why agents devour context windows.

Practical notes#

Prefer native Tools APIs. OpenAI / Anthropic / Gemini all expose tools fields with ReAct baked in — no need to hand-roll a parser.
One-line descriptions per tool. Spell out "when to use, required args, return shape" — that line is what makes tool selection accurate.
Cap steps. Hard limit 8–15. When stuck, error out or hand off to a human.
Keep Observations short. A long JSON return immediately devours half the context. Summarise / filter before feeding back.
Don't expose Thought in production UI. Reasoning chains belong in dev traces; the frontend should show "final answer + key action progress" only.

Easy confusions#

ReAct

**Calls tools**, can change the outside world.
Multi-round loop, each step grounded in the previous.

Pure CoT

**Thinks in head only**, doesn't change the world.
One-shot reasoning chain + answer.

ReAct

**Think and act simultaneously**, decide every step.
Good for branching, uncertain tasks.

Plan & Execute

**Plan all steps up front**, then execute in batch.
Good for stable flows, saves Tokens.