ReAct Pattern | Agentic Engineering

An interleaved Reasoning + Acting loop where the agent explicitly reasons about observations before selecting actions, creating a trace of thought that grounds decisions in retrieved evidence.

Core Structure

Thought: Analyze the current situation based on observations
Action: Select and invoke a tool
Observation: Record the tool's output
Thought: Analyze the new observation, decide next step
Action: Select next tool (or finish)
...

The cycle continues until the task completes or a termination condition triggers.

How It Works

ReAct (Reasoning + Acting) alternates between two modes:

Reasoning - The model generates explicit thoughts about what it observes and what to do next
Acting - The model invokes a tool, producing new observations

This interleaving creates three key properties:

Grounded decisions: Each action follows explicit reasoning about current observations, reducing hallucination. The model cannot claim to have information it hasn't retrieved.

Observable traces: The thought-action-observation chain creates an audit trail. When debugging failures, the trace shows exactly where reasoning diverged from evidence.

Adaptive behavior: Each observation can redirect the plan. Unlike fixed multi-step plans, ReAct adjusts based on what tools actually return.

Implementation

The Basic Loop

System Prompt:
You solve tasks by interleaving Thought, Action, and Observation steps.

Thought: Reason about the current state and what to do next.
Action: Call exactly one tool. Format: tool_name(param=value)
Observation: [Tool output will appear here]

Continue until you can answer the task.

Task: {user_task}

Execution Flow

Model generates a Thought (explicit reasoning)
Model generates an Action (tool call)
System executes tool, returns Observation
Model generates next Thought based on Observation
Repeat until task completes

Structured Output Variant

For production systems, enforce structure:

{
  "thought": "The user asked about error handling. I should search the codebase for exception patterns.",
  "action": {
    "tool": "grep",
    "params": {"pattern": "except|catch", "path": "src/"}
  }
}

The observation appends to context as a system message, triggering the next thought-action cycle.

When to Use

Good Fit

Information gathering tasks requiring evidence:

Research questions needing multiple sources
Debugging where the root cause is unknown
Exploratory analysis across unfamiliar codebases
Tasks where premature conclusions cause failures

Task characteristics:

Unknown number of steps required
Each step depends on what previous steps revealed
Hallucination is high-risk (medical, legal, financial)
Explainability matters (audit trails, user trust)

Poor Fit

Well-defined procedures with known steps:

Standard CRUD operations
Template-based generation
Tasks where the action sequence is predetermined
High-throughput scenarios where reasoning overhead is unacceptable

When simpler approaches work:

Single-tool tasks (no interleaving needed)
Tasks where the model already knows the answer (no retrieval needed)
Time-critical operations (ReAct adds latency per step)

Trade-offs

Aspect	ReAct	Direct Generation
Grounding	Strong (every claim from observation)	Weak (hallucination risk)
Latency	Higher (multiple turns)	Lower (single generation)
Token cost	Higher (thoughts + observations accumulate)	Lower (single response)
Explainability	Excellent (full trace)	Poor (black box)
Adaptability	High (adjusts to observations)	Low (fixed plan)

The trade-off: ReAct sacrifices speed for accuracy and observability.

Comparison with Other Patterns

ReAct vs. Chain-of-Thought

Chain-of-Thought (CoT): Model reasons step-by-step but generates the full answer before any tool use. Thoughts are internal to a single generation.

ReAct: Thoughts interleave with tool calls. Each observation grounds the next thought.

CoT: Thought → Thought → Thought → Final Answer
ReAct: Thought → Action → Observation → Thought → Action → Observation → ...

CoT excels at reasoning tasks with sufficient in-context knowledge. ReAct excels when external information retrieval is required.

ReAct vs. Plan-Build-Review

Plan-Build-Review: Separates planning from execution. Plan specifies all steps upfront, then build executes.

ReAct: No upfront planning. Each step emerges from the previous observation.

Aspect	Plan-Build-Review	ReAct
Planning	Explicit, upfront	Implicit, emergent
Adaptability	Follows spec	Adjusts per observation
Coordination	Multiple agents, checkpoints	Single agent loop
Best for	Complex, multi-phase projects	Exploratory, information-gathering

Synthesis: Use Plan-Build-Review for tasks with known structure, ReAct for tasks where structure emerges from investigation.

ReAct vs. Autonomous Loops (Ralph Wiggum)

Ralph Wiggum: Iteration-based, with git history as external memory. Fresh context per loop.

ReAct: Single context window with accumulated observations.

Ralph suits mechanical tasks with many iterations. ReAct suits reasoning tasks requiring observation synthesis within a session.

Anti-Patterns

Thought-Free Actions

Problem: Skipping the Thought step, invoking tools without explicit reasoning.

Why it fails: Without explicit reasoning, the model may invoke tools randomly or redundantly. The trace becomes useless for debugging.

Solution: Enforce thought generation before every action. Validate that thoughts reference prior observations.

Observation Overload

Problem: Tools return massive outputs that fill context.

Why it fails: Large observations crowd out earlier thoughts and observations. Reasoning quality degrades as context fills.

Solution: Limit observation size. Summarize or truncate tool outputs. Consider spawning sub-agents for analysis (see Orchestrator Pattern).

Infinite Loops

Problem: Agent cycles through the same thought-action pairs without progress.

Why it fails: Without termination conditions, the loop continues indefinitely, wasting tokens.

Solution:

Limit maximum iterations (10-20 for most tasks)
Detect repeated actions with same parameters
Require explicit "Final Answer" action to terminate

Reasoning Without Evidence

Problem: Thoughts that introduce claims not present in observations.

Why it fails: The grounding benefit of ReAct evaporates. Hallucination enters through the thought step.

Solution: Prompt for evidence-grounded reasoning. Validate that thoughts cite specific observations. Flag unsupported claims.

Production Considerations

Context Management

ReAct accumulates context rapidly. Each cycle adds thought + action + observation tokens.

Mitigation strategies:

Summarize older observations as context fills
Spawn fresh agent for new investigation threads
Limit observation verbosity at the tool level

Latency Budgeting

Each thought-action-observation cycle requires a model call plus tool execution time.

For time-sensitive applications:

Set maximum iterations based on latency budget
Parallelize independent tool calls within a single action step (if supported)
Cache tool results for repeated queries

Cost Tracking

ReAct's token cost scales with both depth (iterations) and breadth (observation size).

Cost formula (approximate):

Cost ≈ Σ(thought_tokens + action_tokens + observation_tokens) × iterations

For complex tasks requiring 10+ iterations with substantial observations, ReAct can exceed 10× the cost of direct generation. Budget accordingly.

Model Considerations

Temperature Settings

[2026-01-30]: ReAct benefits from low temperature (0.0-0.3) for reliable reasoning chains. Higher temperature increases variance between thoughts, leading to inconsistent investigation paths.

Multi-step reliability degrades with temperature (see Model Behavior: Temperature Effects). For 10-step ReAct chains at temperature 1.0, reliability drops to approximately 60%.

Model Selection

ReAct requires strong instruction-following to maintain the thought-action-observation structure. Frontier models (Opus 4.5, GPT-4o, Gemini 2.0 Pro) maintain structure reliably. Mid-tier models may drift from the format over many iterations.

Observed pattern: Smaller models (Haiku, GPT-3.5) often collapse the thought step or skip directly to final answers. Reserve ReAct for tasks where model capability justifies the pattern overhead.

Implementation in Claude Code

Claude Code's tool-use system naturally supports ReAct-style interaction:

Extended thinking provides the "Thought" component
Tool calls map to "Action"
Tool results return as "Observation"

The pattern emerges without explicit prompting when extended thinking is enabled and tools are available. Claude reasons about tool selection, invokes the tool, then reasons about results.

For explicit ReAct traces, structure the system prompt to request <thought> blocks before tool calls:

Before each tool call, output your reasoning in <thought> tags:
 
<thought>
The user needs information about X. I should search for Y because Z.
</thought>
 
Then invoke the appropriate tool.

Connections

To Tool Use: ReAct is a meta-pattern for tool orchestration. Tool design affects observation quality—verbose tools create context pressure, terse tools may lack information. See Tool Design for principles.
To Model Behavior: Temperature and instruction-following reliability directly affect ReAct chain quality. Extended thinking modes complement ReAct by providing internal reasoning before action selection.
To Plan-Build-Review: Complementary patterns. Plan-Build-Review for known structure, ReAct for emergent investigation. Synthesis: use ReAct within the Research phase to discover information for planning.
To Orchestrator Pattern: ReAct can operate within an orchestrator's sub-agents. Scout agents using ReAct gather grounded observations that the orchestrator synthesizes.
To Context Management: ReAct's accumulated observations require careful context management. Progressive disclosure and observation summarization prevent context exhaustion.

Origins and References

The ReAct pattern was introduced in:

Yao et al. (2023): "ReAct: Synergizing Reasoning and Acting in Language Models" arxiv.org/abs/2210.03629

The paper demonstrated that interleaving reasoning traces with action execution outperforms both reasoning-only (Chain-of-Thought) and action-only approaches on tasks requiring information retrieval and multi-step reasoning.

Key findings from the paper:

ReAct reduces hallucination by grounding reasoning in observations
The trace improves human interpretability and error diagnosis
Performance gains appear primarily on tasks requiring external knowledge

Subsequent work extended ReAct with reflection (Reflexion), planning (Plan-and-Solve), and tool-augmented variants across different domains.