AI Builders & CTOs

Build AI Agents That Call Tools Autonomously

Q: Setting a Recursion Limit in LangGraph In LangGraph, you set the recursion limit in the RunnableConfig object: { recursionLimit: 10 }. By default, it's 25–but for production, 10–15 is more reasonable. This limit counts iterations, not tokens. If the agent hits the limit, LangGraph throws a GraphRecursionError–and you absolutely must catch that with try/catch, or your whole agent will crash uncontrolled. How to Implement in LangGraph and API Code import { StateGraph, RunnableConfig } from "@langchain/langgraph"; // Set recursion limit in config const config: RunnableConfig = { recursionLimit: 10, // Default is 25; use 10–15 for production }; // Token budget monitoring via callback let totalTokensUsed = 0; const TOKEN_BUDGET = 50_000; const result = await agent.invoke(input, { ...config, callbacks: [{ handleLLMEnd(output) { const used = output.llmOutput?.tokenUsage?.totalTokens ?? 0; totalTokensUsed += used; if (totalTokensUsed > TOKEN_BUDGET) { throw new Error("TOKEN_BUDGET_EXCEEDED"); } } }] }); ⚠️ Heads up:Hard limits without a fallback handler are useless. What happens when the limit triggers? The agent stops–but your system needs to catch that, return a partial result to the user, and trigger an alert. Wrapping agent.invoke in try/catch isn"t just nice to have–it"s essential. One thing most guides skip:If you set your recursion limit too low, you"ll kill legitimate long-running tasks. For example, if your agent needs to process 15 documents sequentially, it might need 20+ cycles. Configuring this isn"t trivial–you need a clear idea of your expected call path. So, what"s next after hard limits?

Even with a solid ceiling, you need to tell your agent how to stop–not just when.

Most runaway AI agent costs aren't about bad prompts–they happen because you forgot to set hard limits. Here are 5 architecture moves that end infinite loops before they burn your budget (or your reputation).

Georg Singer·May 6, 2026·18 min read

Build AI Agents That Call Tools Autonomously

"It works on my laptop."
Friday night: you deploy an AI agent for document processing. By Monday morning, you"re staring at an OpenAI bill for $47,000 (~€43,000). What happened? Your agent ran wild in an endless loop for 11 days, endlessly retrying the same failed API call–because there was no code telling it when to stop. No alert. No hard limit. No budget cap.

This story actually happened.

If you think this is just a rare fluke, think again. According to the AICosts.ai Budget Disaster Prevention Report (2025, based on 200+ real-world incidents), a whopping 87% of agent cost overruns happen because there"s no hard limit set. Not because of a weak model, not because of a bad prompt.

And here"s the kicker: 73% of teams don"t track agent costs in real time. That means you"re not just watching your budget slip away–most teams don"t even realize it until it"s far too late. On average, these overruns are 340% above what was originally estimated.

And get this–the loop usually doesn"t start where you think it does.

Quick Summary: Why AI Agents Get Stuck (and What Actually Costs You)

Let"s break down the most important numbers and what they mean for you:

87% of AI agent cost blowouts come from missing hard limits–not from bad models or prompts (AICosts.ai Budget Disaster Prevention Report).
That means almost all runaway bills are an architecture problem–not an AI problem.
73% of teams have no real-time cost tracking for their AI agents. The average overrun? 340%. So when your bill explodes, you"re not alone–and you"re not even the exception.
40% of agentic AI projects will be abandoned by 2027, says Gartner, mostly because teams don"t trust their reliability (see Composio 2025, source: Composio 2025). If you"re building for production, the odds are against you–unless you fix this.
95% of enterprise GenAI pilots never make it to production (MIT GenAI Divide Report / Composio, 2025). Loops aren"t the only reason–but they"re often the reason you can"t recover from failure.
ReAct (Reason + Act) is, by design, a loop with no natural exit.
If you use it, you"re building in a structural loop risk from the start.
Defense-in-depth is essential: You need all 5 mechanisms together–each one catches a different type of loop.

Now let"s dive deeper: what actually causes these loops, and how do you stop them before they cost you a fortune?

Why Do AI Agents Fall into Infinite Loops?

Imagine this: your agent keeps repeating the same tool call over and over, never realizing it"s not making progress. That"s an infinite loop–and it happens more often than you"d like.

An infinite loop (or agent loop) happens when your AI agent keeps making the same (or very similar) tool call over and over, with no clear stopping rule. This can be triggered by ambiguous tool results, missing "I"m done" signals, or tools that overlap in what they do. Without a hard stop, the agent will happily keep going–until it hits a token limit, you manually kill it, or your budget evaporates.

Why ReAct Architecture Is a Built-in Loop Risk

You might be using the ReAct pattern (Reason + Act) for your agents. Here"s the problem:
ReAct is designed as a cycle. The agent observes, thinks, acts, observes the result... and starts again. There"s no structural exit built in. That"s not a bug. It"s the design.

In traditional software (think REST APIs, queues, cron jobs), you always know when you"re done–there are clear termination states. In a ReAct loop, you get none of that out of the box. You have to build those exits yourself.

"Watched another agentic AI project crash last week. The exact same mistake everyone makes. Over 40% of these projects fail not because of the models, but because of poor architecture. Everyone is building demos."
– @rohit4verse on X

That"s not just theory–it"s how projects fail in the real world.

The Top 3 Loop Triggers in Multi-Tool Agents

Let"s make this real. Here"s where you"re most likely to get bitten:

1. The tool never says if it succeeded.
Your agent calls search_documents, gets back an empty array. Is that an error? No documents found? The agent doesn"t know–so it tries again.

2. You have two tools that overlap but aren"t identical.
Maybe you have get_customer_info and lookup_account. Both return customer data, but in slightly different formats. The agent keeps bouncing between them, trying to get what it wants, but never quite hitting the target.

3. Context grows, but earlier tool outputs are forgotten.
In long chains of reasoning, new context pushes old tool results out of the effective window. The agent can"t "see" what it did earlier–so it repeats work, calling the same tools again.

Each of these can silently kick off a loop that eats your tokens–and your cash.

Mechanism 1: Hard Limits–Your First Line of Defense

Let"s cut to the chase:
A hard limit is a fixed ceiling you set for your agent"s run. This can be a recursion limit (the max number of iterations or steps the agent is allowed) or a token budget (the max number of input and output tokens). Think of it as a circuit breaker: no matter what the model or tools do, the agent cannot go past this line.

Here's the punchline: Almost all runaway costs happen because you never set this up.
According to AICosts.ai, 87% of cost blowouts come from missing hard limits. The fix? It takes about 30 minutes–if you know the likely tool-call path. That "if" is where most teams trip up on their first try.

This isn"t just an edge case.
40% of all agentic AI projects will be abandoned by 2027 (Gartner, see Composio 2025) mainly over reliability fears. Hard limits are the fastest way to address those fears at an architectural level.

Recursion Limit vs. Token Budget–What"s the Difference?

These two mechanisms are different, and you need both:

Recursion limit: Stops the agent after N cycles, no matter what. It"s a structural safety net.
Token budget: Stops the agent when input + output tokens cross your threshold. It"s an economic limit, and you can make it as granular as you like.

Use both. Set a recursion limit as your ceiling, and a token budget as your tripwire for runaway cost.

Now, let"s see how this looks in a real agent framework.

Setting a Recursion Limit in LangGraph

In LangGraph, you set the recursion limit in the RunnableConfig object: { recursionLimit: 10 }. By default, it's 25–but for production, 10–15 is more reasonable. This limit counts iterations, not tokens.

If the agent hits the limit, LangGraph throws a GraphRecursionError–and you absolutely must catch that with try/catch, or your whole agent will crash uncontrolled.

How to Implement in LangGraph and API Code

import { StateGraph, RunnableConfig } from "@langchain/langgraph";

// Set recursion limit in config
const config: RunnableConfig = {
  recursionLimit: 10, // Default is 25; use 10–15 for production
};

// Token budget monitoring via callback
let totalTokensUsed = 0;
const TOKEN_BUDGET = 50_000;

const result = await agent.invoke(input, {
  ...config,
  callbacks: [{
  handleLLMEnd(output) {
  const used = output.llmOutput?.tokenUsage?.totalTokens ?? 0;
  totalTokensUsed += used;

  if (totalTokensUsed > TOKEN_BUDGET) {
  throw new Error("TOKEN_BUDGET_EXCEEDED");
  }
  }
  }]
});

⚠️ Heads up:
Hard limits without a fallback handler are useless. What happens when the limit triggers? The agent stops–but your system needs to catch that, return a partial result to the user, and trigger an alert. Wrapping agent.invoke in try/catch isn"t just nice to have–it"s essential.

One thing most guides skip:
If you set your recursion limit too low, you"ll kill legitimate long-running tasks. For example, if your agent needs to process 15 documents sequentially, it might need 20+ cycles. Configuring this isn"t trivial–you need a clear idea of your expected call path.

So, what"s next after hard limits?

Even with a solid ceiling, you need to tell your agent how to stop–not just when.

Mechanism 2: Structured Output as a Clear Exit Signal

Ever had your agent say, "I"m done," but you couldn"t tell if it really was? Or worse, it keeps talking in natural language, and you have to guess if it finished? That"s where structured output comes in.

"YOU"RE USING OPENCLAW WRONG (I WAS TOO) [...] most agents waste 2-3x tokens: every request injects bootstrap files into context."
– @polydao on X

Structured output eliminates all that guesswork–and more importantly, it slashes unnecessary token use.

What does this mean for you?
Instead of having your LLM say "I"ve finished processing all relevant documents and concluded that...," you require it to return a machine-readable JSON schema with a status field (complete | needs_tool | failed). Your orchestrator checks this field–no parsing text, no ambiguity, no accidental loops.

Why "done: true" Beats Natural Language Every Time

A 2026 study comparing LangGraph to CrewAI by markaicode.com found that structured branching in LangGraph saves ~28% tokens per request over plain ReAct loops. CrewAI, without structured output, burns up to 56% more tokens per request.

It"s not that the model "thinks" more efficiently–it"s that you avoid multiple re-prompts, misinterpretations, and reruns, which add up fast.

Example: Schema Design for Clear Termination States

import { z } from "zod";

const AgentOutputSchema = z.object({
  status: z.enum(["complete", "needs_tool", "failed"]),
  result: z.string().optional(),
  nextTool: z.string().optional(),
  nextToolParams: z.record(z.unknown()).optional(),
  reasoning: z.string(),
  retryRecommended: z.boolean().optional(),
});

type AgentOutput = z.infer<typeof AgentOutputSchema>;

// Orchestrator loop
async function orchestrate(maxSteps = 10): Promise<string> {
  let step = 0;

  while (step < maxSteps) {
  const output: AgentOutput = await llm.structuredOutput(
  AgentOutputSchema,
  buildSystemPrompt(),
  conversationHistory
  );

  if (output.status === "complete") return output.result!;
  if (output.status === "failed") throw new Error(`Agent failed: ${output.reasoning}`);

  // needs_tool: deterministic branch
  const toolResult = await callTool(output.nextTool!, output.nextToolParams);
  conversationHistory.push({ role: "tool", content: JSON.stringify(toolResult) });
  step++;
  }

  throw new Error("MAX_STEPS_EXCEEDED");
}

Every tool call should follow the same principle: return success: boolean, retryRecommended: boolean, and a clear data field–not a free-text status. Your orchestrator reads fields, not sentences.

This is one of those tricks you rarely see written down–but once you try it, you"ll never go back.

SwiftRun automates repetitive workflows with AI agents – so your team can focus on what matters.

Try Free Book a Demo

Mechanism 3: State Machines–When to Ditch the ReAct Loop

Let"s say you want real reliability–tracing, reproducibility, and no unplanned state changes. That"s when a state machine (or graph) beats a free-form ReAct loop.

Instead of letting the LLM decide at every step, a state machine defines allowed state transitions in a graph. The agent cannot jump to an impossible state–because that edge literally doesn"t exist. The result? Audit trails, observability, and built-in tracing–not as afterthoughts, but as part of the architecture.

"Shipping agents to production is hard. Traditional software is deterministic–agents rely on non-deterministic models [...] The goal is to take an agent from first run to production-ready system through iterative cycles of improvement. You'll learn how to do this with LangSmith."
– @LangChain on X

And the reliability problem is very real. The LangChain State of AI Agents survey (2025) shows that 73% of enterprise AI agent deployments have reliability failures in their first year. As @hasantoxr puts it:
"Most teams shipping AI agents have zero regression testing."

State machines with audit trails are the antidote: every state change is traceable, testable, and reproducible.

But let"s be honest:
The LangChain ecosystem is divided. 45% of developers who try LangChain never deploy it to production. 23% who did, later removed it. Even so, LangGraph (LangChain"s state-machine product) is the strongest option for structured agent orchestration right now. Both things can be true at once.

LangGraph State Machine: Minimal Example with Tool Routing

import { StateGraph, END } from "@langchain/langgraph";
import { Annotation } from "@langchain/langgraph";

const AgentState = Annotation.Root({
  task: Annotation<string>(),
  toolResults: Annotation<string[]>({
  reducer: (existing, update) => [...existing, ...update],
  default: () => [],
  }),
  retryCount: Annotation<number>({ default: () => 0 }),
  status: Annotation<"planning" | "executing" | "evaluating" | "done" | "failed">({
  default: () => "planning",
  }),
});

const graph = new StateGraph(AgentState)
  .addNode("plan", planNode)
  .addNode("execute_tool", executeToolNode)
  .addNode("evaluate_result", evaluateResultNode)
  .addNode("terminate", terminateNode)
  .addEdge("__start__", "plan")
  .addEdge("plan", "execute_tool")
  .addConditionalEdges("evaluate_result", (state) => {
  if (state.status === "done") return "terminate";
  if (state.retryCount >= 3) return "terminate"; // Hard stop: no endless retries
  return "execute_tool";
  })
  .addEdge("execute_tool", "evaluate_result")
  .addEdge("terminate", END);

export const agent = graph.compile({ recursionLimit: 15 });

Notice the edge for retryCount >= 3: From evaluate_result, the agent only goes back to execute_tool if retries remain. Otherwise, it terminates–returning a partial result, not crashing out.

Trade-off: State machines require more up-front design. If you"re building research agents where you don"t know the call path, a ReAct loop may make sense. But for any production agent using 3+ tools, you need to justify not using a state graph.

When Is a State Machine Better Than a ReAct Loop?

Rule of thumb:
Once you have 3+ tools and you"re going to production, use a state machine.
A ReAct loop gives max flexibility–with zero predictability–great for prototypes, not for prod.
A state machine makes allowed transitions explicit. No agent can jump to an invalid state. The payoff? Audit trail, tracing, and regression-testability–none of which are possible in a plain ReAct loop.

Now that you know how to structure agent control flow, let"s see how to make sure it doesn"t redo work it"s already done.

Mechanism 4: Tool-Call Deduplication and Idempotency

Let"s talk about redundancy. Every time your agent calls the same tool with the same parameters, you"re not just wasting time–you"re seeing a classic loop signal.

"Layered context architecture for agents to avoid redundancy in production."
– @koylanai on X

This isn"t just an architecture tip–it"s the core problem that deduplication solves at the orchestrator level.

Same Tool Call Twice = Loop Warning

If your agent makes the same tool call twice in a row with identical parameters, that"s not always a bug–but it is a warning sign. Without a deduplication layer, every redundant call goes out: network roundtrip, latency, token cost. With 67 tools across 13 MCP servers, average tool-selection latency is already 385ms per call (see @liquidai)–and that"s before you account for memory wrappers, which can add over a second per API call in LangChain.
Add deduplication, and at least the unnecessary calls disappear.

How to Do Hash-Based Deduplication in Your Orchestrator

import crypto from "crypto";

class ToolCallDeduplicator {
  private cache = new Map<string, unknown>();

  private hash(toolName: string, params: unknown): string {
  return crypto
  .createHash("sha256")
  .update(JSON.stringify({ toolName, params }))
  .digest("hex");
  }

  async call<T>(
  toolName: string,
  params: unknown,
  executor: () => Promise<T>
  ): Promise<T> {
  const key = this.hash(toolName, params);

  if (this.cache.has(key)) {
  // Don"t abort–return cached result
  console.warn(`[Loop-Detection] Duplicate call detected: ${toolName}`);
  return this.cache.get(key) as T;
  }

  const result = await executor();
  this.cache.set(key, result);
  return result;
  }
}

The trick? Don"t throw an error–just return the cached result. The agent gets the same answer, and can decide if it"s finished. If it tries a third time, your hard limit should catch it.

For side-effect tools (think: send email, write to DB, trigger webhook), you need a different pattern: idempotency keys in the request header–not deduplication. Two identical email requests should not send two emails–but you also don"t want to return a cached status in this case.

Mechanism 5: Cost-Aware Termination–Teaching Your Agent About Budget

Ever wonder how much an uncontrolled agent loop costs in production?
Short answer: it gets ugly, fast.

The cost curve isn"t linear–as your loop deepens, costs explode. For example, with a reasoning loop where context keeps growing, 10 iterations might cost you $1.20, but 50? That"s already ~$25.
At 100 iterations, you"re looking at ~$90 or more. And for cross-tool loops with no hard limit?
The sky really is the limit–like that infamous $47,000 bill in 11 days.

Token Costs by Loop Type

Not all loops drain your wallet equally. Here"s how the most common loop types stack up:

Loop Type	Description	Cost at 10 Iterations	Cost at 50 Iterations	Cost at 100 Iterations
Tool Error Loop	Agent repeats same failed API call	~$0.04–$0.12	~$0.20–$0.60	~$0.40–$1.20
Reasoning Loop	Growing context, no decision made	~$0.80–$2.00	~$15–$40	~$60–$120
Cross-Tool Loop	Agent toggles between 2+ tools, context grows fast	~$1.50–$4.00	~$40–$100	unlimited*

*Estimates based on public price lists, March 2026: Claude 3.5 Sonnet ($3/1M input + $15/1M output), GPT-4o ($2.50/$10), Gemini 1.5 Pro ($1.25/$5). These are rough estimates, not production data.

Cost-Aware Prompting: Tell Your Agent Its Budget

function buildSystemPrompt(remainingTokenBudget: number): string {
  const isLow = remainingTokenBudget < 10_000;

  return `
You are a document-processing agent.

Token budget status: ${isLow ? "CRITICAL" : "normal"}
Remaining token budget: ${remainingTokenBudget.toLocaleString()} tokens

${isLow ? `
IMPORTANT: Your token budget is nearly exhausted.
Finish the current task or return a partial result.
DO NOT start new tool calls that cost more than 2,000 tokens.
` : ""}

Available tools: [...]
`.trim();
}

It sounds simple–but it works. LLMs really do optimize differently when they know what"s left in the tank–just like you"d prioritize differently if you had only two hours left in your sprint.

"I just processed 140,400,000 tokens in 48 hours. Raw API bill: $1,677.82. My actual cost: $50.00."
– @ziwenxu_ on X

That"s what happens when you combine prompt caching and architectural cost control–cost-aware termination makes it possible. Anthropic"s prompt caching, for instance, can cut input costs by up to 90% if your system prompt stays stable.

For comparison, Jason Calacanis reports $300 per agent per day at only 10–20% utilization–projected at ~$100,000 per year per agent. The difference isn"t the model. It"s the architecture.

Decision Tree: Which Mechanism for Which Use Case?

Here"s where the rubber meets the road.
You need to match your loop-prevention mechanism to your use case.
The table below gives you the tradeoffs for each approach–across loop types, implementation effort, token overhead, and when you should use it.

Mechanism	Loop Types Stopped	Implementation Effort	Token Overhead	Recommended For
Hard Limits (Recursion + Token Budget)	All loop types (as safety net)	~30 min (if call path is known)	Minimal (<0.1%)	Every agent
Structured Output Exit Signal	Tool Error, Reasoning Loops	1–3 hrs (schema + orchestrator)	~2–5% (JSON in prompt)	2+ tool types
State Machine (LangGraph)	Cross-tool, Reasoning Loops	4–8 hrs (graph, edge logic)	~5–10% (state)	3+ tools, production
Tool-Call Deduplication	Tool Error, Cross-tool Loops	2–4 hrs (cache + hash layer)	Negative (saves calls)	High-volume, MCP
Cost-Aware Termination	All, especially Reasoning Loops	1–2 hrs (budget + prompt)	Minimal (prompt tweak)	Production must-have

Token Cost Table by Loop Type

Loop Type	10 Iterations	50 Iterations	100 Iterations
Tool Error Loop	Claude 3.5 Sonnet: ~$0.08 / GPT-4o: ~$0.06 / Gemini 1.5 Pro: ~$0.03	~$0.40 / ~$0.30 / ~$0.15	~$0.80 / ~$0.60 / ~$0.30
Reasoning Loop	~$1.20 / ~$0.90 / ~$0.45	~$25 / ~$18 / ~$9	~$90 / ~$65 / ~$30
Cross-Tool Loop	~$2.50 / ~$2.00 / ~$1.00	~$60 / ~$45 / ~$22	unlimited*

*Estimates as of March 2026, public price lists. No hard limit = theoretically unlimited.

Recommendations by Use Case:

Single tool, simple logic: Hard limit is enough (recursion + token budget)
Multi-tool, clear order: State machine with LangGraph
Multi-tool, dynamic planning: Structured output exit signal + deduplication
High-volume production: Cost-aware termination is mandatory; layer in the others below it
Every production agent: Use all 5 mechanisms (defense-in-depth)

Why all 5?
In a 4-stage multi-agent system with 95% accuracy per stage, total system reliability drops to just 81%. Each mechanism stops a different kind of loop. They"re not interchangeable.

And here"s an uncomfortable truth from multi-agent research:

"Researchers planted a single bad actor inside a group of LLM agents. The whole network failed to reach consensus. This is the Byzantine Generals Problem [...] the practical implication is uncomfortable for anyone building multi-agent systems."
– @rryssf_ on X

Loop prevention isn"t just about saving money. It"s about protecting the integrity of your whole system.

Checklist: Is Your Agent Ready for Production?

Hard limits set (recursion & token budget)
Structured output for all agent responses
State machine or explicit control flow for 3+ tools
Deduplication layer for tool calls
Cost-aware prompting and real-time budget tracking
Fallback handlers for all hard stops
Regression tests for agent logic
Audit trail for all state transitions

If you can"t tick every box, you"re not ready for production–and you"re risking much more than your next cloud bill.

The Bottom Line: Build for Loops, Not Against Them

Ready to build AI agents that are robust and cost-effective? SwiftRun.ai offers built-in loop prevention and cost monitoring so you can deploy with confidence. Start your free trial today – no credit card required.

Further reading: [Debugging AI Agents in Production](https://the platform/blog/ai-agent-debugging-production) – for when your loop prevention kicks in, but you have no idea why. Or: [When a Pipeline Beats an Agent](https://the platform/blog/ai-pipeline-vs-agent-unterschied) – because sometimes, the right fix isn"t a better agent, but no agent at all.

Also worth a look:

[What Does "Human-in-the-Loop" Actually Mean?](https://the platform/blog/human-in-the-loop-ai-agenten)
How to Debug an AI Agent Returning Wrong Results
AI Pipeline vs. AI Agent: What"s the Difference?

Author: Georg Singer

You"re not building against loops. You"re building for resilience.
Make your next AI agent bulletproof–before you find out what infinite really costs.

Related Articles:

Ready to supercharge your AI agent's capabilities? Start building intelligent tools that operate autonomously today by checking out SwiftRun.ai!