ArcLibrary

ReAct (Reason + Act)

The core agent pattern — think one step, act, observe, then think again.

AgentReAct
核心 · Key Idea

In one line: ReAct = Reason + Act. The model alternates three blocks of text — Thought → Action → Observation — looping until the task is done. Today ~90% of agent frameworks (LangChain / OpenAI tools / Claude tools) are variants of this pattern under the hood.

What it is#

Each round emits three blocks:

Thought: I need to know the weather in Beijing today; I should use the search tool.
Action:  search("Beijing weather today")
Observation: 26°C, cloudy.

Thought: The user asked whether they need an umbrella; I can answer now.
Action:  finish("No umbrella needed — cloudy today.")

Treat Thought / Action / Observation as a structured log generated by the model and looped back into the prompt — every round bases its decision on the previous Observation.

Analogy#

打个比方 · Analogy

ReAct is like detective work:

  • Reason = hypothesise ("the suspect might be A because of motive");
  • Act = collect evidence (verify A's alibi);
  • Observation = receive the evidence;
  • Then reason again — looping until the case is solved.

Key concepts#

ThoughtThought
The model thinks about the next step in natural language — CoT applied within a loop.
ActionAction
A structured tool call (Function Calling): which tool, what args.
ObservationObservation
The tool's return value, fed back to the model so it can continue.
Stop ConditionStop condition
Explicit `finish()` from the model or hitting the max-step cap.

How it works#

Each loop splices the entire history (all Thought / Action / Observation) back into the prompt — which is why agents devour context windows.

Practical notes#

  • Prefer native Tools APIs. OpenAI / Anthropic / Gemini all expose tools fields with ReAct baked in — no need to hand-roll a parser.
  • One-line descriptions per tool. Spell out "when to use, required args, return shape" — that line is what makes tool selection accurate.
  • Cap steps. Hard limit 8–15. When stuck, error out or hand off to a human.
  • Keep Observations short. A long JSON return immediately devours half the context. Summarise / filter before feeding back.
  • Don't expose Thought in production UI. Reasoning chains belong in dev traces; the frontend should show "final answer + key action progress" only.

Easy confusions#

ReAct
**Calls tools**, can change the outside world.
Multi-round loop, each step grounded in the previous.
Pure CoT
**Thinks in head only**, doesn't change the world.
One-shot reasoning chain + answer.
ReAct
**Think and act simultaneously**, decide every step.
Good for branching, uncertain tasks.
Plan & Execute
**Plan all steps up front**, then execute in batch.
Good for stable flows, saves Tokens.

Further reading#