Build AI Agents That Call Tools Autonomously
Most runaway AI agent costs aren't about bad prompts–they happen because you forgot to set hard limits. Here are 5 architecture moves that end infinite loops before they burn your budget (or your reputation).

"It works on my laptop."
Friday night: you deploy an AI agent for document processing. By Monday morning, you"re staring at an OpenAI bill for $47,000 (~€43,000). What happened? Your agent ran wild in an endless loop for 11 days, endlessly retrying the same failed API call–because there was no code telling it when to stop. No alert. No hard limit. No budget cap.
If you think this is just a rare fluke, think again. According to the AICosts.ai Budget Disaster Prevention Report (2025, based on 200+ real-world incidents), a whopping 87% of agent cost overruns happen because there"s no hard limit set. Not because of a weak model, not because of a bad prompt.
And here"s the kicker: 73% of teams don"t track agent costs in real time. That means you"re not just watching your budget slip away–most teams don"t even realize it until it"s far too late. On average, these overruns are 340% above what was originally estimated.
And get this–the loop usually doesn"t start where you think it does.
Quick Summary: Why AI Agents Get Stuck (and What Actually Costs You)
Let"s break down the most important numbers and what they mean for you:
- 87% of AI agent cost blowouts come from missing hard limits–not from bad models or prompts (AICosts.ai Budget Disaster Prevention Report).
That means almost all runaway bills are an architecture problem–not an AI problem. - 73% of teams have no real-time cost tracking for their AI agents. The average overrun? 340%. So when your bill explodes, you"re not alone–and you"re not even the exception.
- 40% of agentic AI projects will be abandoned by 2027, says Gartner, mostly because teams don"t trust their reliability (see Composio 2025, source: Composio 2025). If you"re building for production, the odds are against you–unless you fix this.
- 95% of enterprise GenAI pilots never make it to production (MIT GenAI Divide Report / Composio, 2025). Loops aren"t the only reason–but they"re often the reason you can"t recover from failure.
- ReAct (Reason + Act) is, by design, a loop with no natural exit.
If you use it, you"re building in a structural loop risk from the start. - Defense-in-depth is essential: You need all 5 mechanisms together–each one catches a different type of loop.
Now let"s dive deeper: what actually causes these loops, and how do you stop them before they cost you a fortune?
Why Do AI Agents Fall into Infinite Loops?
Imagine this: your agent keeps repeating the same tool call over and over, never realizing it"s not making progress. That"s an infinite loop–and it happens more often than you"d like.
An infinite loop (or agent loop) happens when your AI agent keeps making the same (or very similar) tool call over and over, with no clear stopping rule. This can be triggered by ambiguous tool results, missing "I"m done" signals, or tools that overlap in what they do. Without a hard stop, the agent will happily keep going–until it hits a token limit, you manually kill it, or your budget evaporates.
Why ReAct Architecture Is a Built-in Loop Risk
You might be using the ReAct pattern (Reason + Act) for your agents. Here"s the problem:
ReAct is designed as a cycle. The agent observes, thinks, acts, observes the result... and starts again. There"s no structural exit built in. That"s not a bug. It"s the design.
In traditional software (think REST APIs, queues, cron jobs), you always know when you"re done–there are clear termination states. In a ReAct loop, you get none of that out of the box. You have to build those exits yourself.
"Watched another agentic AI project crash last week. The exact same mistake everyone makes. Over 40% of these projects fail not because of the models, but because of poor architecture. Everyone is building demos."
– @rohit4verse on X
That"s not just theory–it"s how projects fail in the real world.
The Top 3 Loop Triggers in Multi-Tool Agents
Let"s make this real. Here"s where you"re most likely to get bitten:
1. The tool never says if it succeeded.
Your agent calls search_documents, gets back an empty array. Is that an error? No documents found? The agent doesn"t know–so it tries again.
2. You have two tools that overlap but aren"t identical.
Maybe you have get_customer_info and lookup_account. Both return customer data, but in slightly different formats. The agent keeps bouncing between them, trying to get what it wants, but never quite hitting the target.
3. Context grows, but earlier tool outputs are forgotten.
In long chains of reasoning, new context pushes old tool results out of the effective window. The agent can"t "see" what it did earlier–so it repeats work, calling the same tools again.
Each of these can silently kick off a loop that eats your tokens–and your cash.
Mechanism 1: Hard Limits–Your First Line of Defense
Let"s cut to the chase:
A hard limit is a fixed ceiling you set for your agent"s run. This can be a recursion limit (the max number of iterations or steps the agent is allowed) or a token budget (the max number of input and output tokens). Think of it as a circuit breaker: no matter what the model or tools do, the agent cannot go past this line.
Here's the punchline: Almost all runaway costs happen because you never set this up.
According to AICosts.ai, 87% of cost blowouts come from missing hard limits. The fix? It takes about 30 minutes–if you know the likely tool-call path. That "if" is where most teams trip up on their first try.
This isn"t just an edge case.
40% of all agentic AI projects will be abandoned by 2027 (Gartner, see Composio 2025) mainly over reliability fears. Hard limits are the fastest way to address those fears at an architectural level.
Recursion Limit vs. Token Budget–What"s the Difference?
These two mechanisms are different, and you need both:
- Recursion limit: Stops the agent after N cycles, no matter what. It"s a structural safety net.
- Token budget: Stops the agent when input + output tokens cross your threshold. It"s an economic limit, and you can make it as granular as you like.
Use both. Set a recursion limit as your ceiling, and a token budget as your tripwire for runaway cost.
Now, let"s see how this looks in a real agent framework.
Setting a Recursion Limit in LangGraph
In LangGraph, you set the recursion limit in the RunnableConfig object: { recursionLimit: 10 }. By default, it's 25–but for production, 10–15 is more reasonable. This limit counts iterations, not tokens.
If the agent hits the limit, LangGraph throws a GraphRecursionError–and you absolutely must catch that with try/catch, or your whole agent will crash uncontrolled.
How to Implement in LangGraph and API Code
import { StateGraph, RunnableConfig } from "@langchain/langgraph";
// Set recursion limit in config
const config: RunnableConfig = {
recursionLimit: 10, // Default is 25; use 10–15 for production
};
// Token budget monitoring via callback
let totalTokensUsed = 0;
const TOKEN_BUDGET = 50_000;
const result = await agent.invoke(input, {
...config,
callbacks: [{
handleLLMEnd(output) {
const used = output.llmOutput?.tokenUsage?.totalTokens ?? 0;
totalTokensUsed += used;
if (totalTokensUsed > TOKEN_BUDGET) {
throw new Error("TOKEN_BUDGET_EXCEEDED");
}
}
}]
});
⚠️ Heads up:
Hard limits without a fallback handler are useless. What happens when the limit triggers? The agent stops–but your system needs to catch that, return a partial result to the user, and trigger an alert. Wrappingagent.invokeintry/catchisn"t just nice to have–it"s essential.
One thing most guides skip:
If you set your recursion limit too low, you"ll kill legitimate long-running tasks. For example, if your agent needs to process 15 documents sequentially, it might need 20+ cycles. Configuring this isn"t trivial–you need a clear idea of your expected call path.
So, what"s next after hard limits?
Even with a solid ceiling, you need to tell your agent how to stop–not just when.
Mechanism 2: Structured Output as a Clear Exit Signal
Ever had your agent say, "I"m done," but you couldn"t tell if it really was? Or worse, it keeps talking in natural language, and you have to guess if it finished? That"s where structured output comes in.
"YOU"RE USING OPENCLAW WRONG (I WAS TOO) [...] most agents waste 2-3x tokens: every request injects bootstrap files into context."
– @polydao on X
Structured output eliminates all that guesswork–and more importantly, it slashes unnecessary token use.
What does this mean for you?
Instead of having your LLM say "I"ve finished processing all relevant documents and concluded that...," you require it to return a machine-readable JSON schema with a status field (complete | needs_tool | failed). Your orchestrator checks this field–no parsing text, no ambiguity, no accidental loops.
Why "done: true" Beats Natural Language Every Time
A 2026 study comparing LangGraph to CrewAI by markaicode.com found that structured branching in LangGraph saves ~28% tokens per request over plain ReAct loops. CrewAI, without structured output, burns up to 56% more tokens per request.
It"s not that the model "thinks" more efficiently–it"s that you avoid multiple re-prompts, misinterpretations, and reruns, which add up fast.
Example: Schema Design for Clear Termination States
import { z } from "zod";
const AgentOutputSchema = z.object({
status: z.enum(["complete", "needs_tool", "failed"]),
result: z.string().optional(),
nextTool: z.string().optional(),
nextToolParams: z.record(z.unknown()).optional(),
reasoning: z.string(),
retryRecommended: z.boolean().optional(),
});
type AgentOutput = z.infer<typeof AgentOutputSchema>;
// Orchestrator loop
async function orchestrate(maxSteps = 10): Promise<string> {
let step = 0;
while (step < maxSteps) {
const output: AgentOutput = await llm.structuredOutput(
AgentOutputSchema,
buildSystemPrompt(),
conversationHistory
);
if (output.status === "complete") return output.result!;
if (output.status === "failed") throw new Error(`Agent failed: ${output.reasoning}`);
// needs_tool: deterministic branch
const toolResult = await callTool(output.nextTool!, output.nextToolParams);
conversationHistory.push({ role: "tool", content: JSON.stringify(toolResult) });
step++;
}
throw new Error("MAX_STEPS_EXCEEDED");
}
Every tool call should follow the same principle: return success: boolean, retryRecommended: boolean, and a clear data field–not a free-text status. Your orchestrator reads fields, not sentences.
This is one of those tricks you rarely see written down–but once you try it, you"ll never go back.
SwiftRun automates repetitive workflows with AI agents – so your team can focus on what matters.
Mechanism 3: State Machines–When to Ditch the ReAct Loop
Let"s say you want real reliability–tracing, reproducibility, and no unplanned state changes. That"s when a state machine (or graph) beats a free-form ReAct loop.
Instead of letting the LLM decide at every step, a state machine defines allowed state transitions in a graph. The agent cannot jump to an impossible state–because that edge literally doesn"t exist. The result? Audit trails, observability, and built-in tracing–not as afterthoughts, but as part of the architecture.
"Shipping agents to production is hard. Traditional software is deterministic–agents rely on non-deterministic models [...] The goal is to take an agent from first run to production-ready system through iterative cycles of improvement. You'll learn how to do this with LangSmith."
– @LangChain on X
And the reliability problem is very real. The LangChain State of AI Agents survey (2025) shows that 73% of enterprise AI agent deployments have reliability failures in their first year. As @hasantoxr puts it:
"Most teams shipping AI agents have zero regression testing."
State machines with audit trails are the antidote: every state change is traceable, testable, and reproducible.
But let"s be honest:
The LangChain ecosystem is divided. 45% of developers who try LangChain never deploy it to production. 23% who did, later removed it. Even so, LangGraph (LangChain"s state-machine product) is the strongest option for structured agent orchestration right now. Both things can be true at once.
LangGraph State Machine: Minimal Example with Tool Routing
import { StateGraph, END } from "@langchain/langgraph";
import { Annotation } from "@langchain/langgraph";
const AgentState = Annotation.Root({
task: Annotation<string>(),
toolResults: Annotation<string[]>({
reducer: (existing, update) => [...existing, ...update],
default: () => [],
}),
retryCount: Annotation<number>({ default: () => 0 }),
status: Annotation<"planning" | "executing" | "evaluating" | "done" | "failed">({
default: () => "planning",
}),
});
const graph = new StateGraph(AgentState)
.addNode("plan", planNode)
.addNode("execute_tool", executeToolNode)
.addNode("evaluate_result", evaluateResultNode)
.addNode("terminate", terminateNode)
.addEdge("__start__", "plan")
.addEdge("plan", "execute_tool")
.addConditionalEdges("evaluate_result", (state) => {
if (state.status === "done") return "terminate";
if (state.retryCount >= 3) return "terminate"; // Hard stop: no endless retries
return "execute_tool";
})
.addEdge("execute_tool", "evaluate_result")
.addEdge("terminate", END);
export const agent = graph.compile({ recursionLimit: 15 });
Notice the edge for retryCount >= 3: From evaluate_result, the agent only goes back to execute_tool if retries remain. Otherwise, it terminates–returning a partial result, not crashing out.
Trade-off: State machines require more up-front design. If you"re building research agents where you don"t know the call path, a ReAct loop may make sense. But for any production agent using 3+ tools, you need to justify not using a state graph.
When Is a State Machine Better Than a ReAct Loop?
Rule of thumb:
Once you have 3+ tools and you"re going to production, use a state machine.
A ReAct loop gives max flexibility–with zero predictability–great for prototypes, not for prod.
A state machine makes allowed transitions explicit. No agent can jump to an invalid state. The payoff? Audit trail, tracing, and regression-testability–none of which are possible in a plain ReAct loop.
Now that you know how to structure agent control flow, let"s see how to make sure it doesn"t redo work it"s already done.
Mechanism 4: Tool-Call Deduplication and Idempotency
Let"s talk about redundancy. Every time your agent calls the same tool with the same parameters, you"re not just wasting time–you"re seeing a classic loop signal.
"Layered context architecture for agents to avoid redundancy in production."
– @koylanai on X
This isn"t just an architecture tip–it"s the core problem that deduplication solves at the orchestrator level.
Same Tool Call Twice = Loop Warning
If your agent makes the same tool call twice in a row with identical parameters, that"s not always a bug–but it is a warning sign. Without a deduplication layer, every redundant call goes out: network roundtrip, latency, token cost. With 67 tools across 13 MCP servers, average tool-selection latency is already 385ms per call (see @liquidai)–and that"s before you account for memory wrappers, which can add over a second per API call in LangChain.
Add deduplication, and at least the unnecessary calls disappear.
How to Do Hash-Based Deduplication in Your Orchestrator
import crypto from "crypto";
class ToolCallDeduplicator {
private cache = new Map<string, unknown>();
private hash(toolName: string, params: unknown): string {
return crypto
.createHash("sha256")
.update(JSON.stringify({ toolName, params }))
.digest("hex");
}
async call<T>(
toolName: string,
params: unknown,
executor: () => Promise<T>
): Promise<T> {
const key = this.hash(toolName, params);
if (this.cache.has(key)) {
// Don"t abort–return cached result
console.warn(`[Loop-Detection] Duplicate call detected: ${toolName}`);
return this.cache.get(key) as T;
}
const result = await executor();
this.cache.set(key, result);
return result;
}
}
The trick? Don"t throw an error–just return the cached result. The agent gets the same answer, and can decide if it"s finished. If it tries a third time, your hard limit should catch it.
For side-effect tools (think: send email, write to DB, trigger webhook), you need a different pattern: idempotency keys in the request header–not deduplication. Two identical email requests should not send two emails–but you also don"t want to return a cached status in this case.
Mechanism 5: Cost-Aware Termination–Teaching Your Agent About Budget
Ever wonder how much an uncontrolled agent loop costs in production?
Short answer: it gets ugly, fast.
The cost curve isn"t linear–as your loop deepens, costs explode. For example, with a reasoning loop where context keeps growing, 10 iterations might cost you $1.20, but 50? That"s already ~$25.
At 100 iterations, you"re looking at ~$90 or more. And for cross-tool loops with no hard limit?
The sky really is the limit–like that infamous $47,000 bill in 11 days.
Token Costs by Loop Type
Not all loops drain your wallet equally. Here"s how the most common loop types stack up:
| Loop Type | Description | Cost at 10 Iterations | Cost at 50 Iterations | Cost at 100 Iterations |
|---|---|---|---|---|
| Tool Error Loop | Agent repeats same failed API call | ~$0.04–$0.12 | ~$0.20–$0.60 | ~$0.40–$1.20 |
| Reasoning Loop | Growing context, no decision made | ~$0.80–$2.00 | ~$15–$40 | ~$60–$120 |
| Cross-Tool Loop | Agent toggles between 2+ tools, context grows fast | ~$1.50–$4.00 | ~$40–$100 | unlimited* |
*Estimates based on public price lists, March 2026: Claude 3.5 Sonnet ($3/1M input + $15/1M output), GPT-4o ($2.50/$10), Gemini 1.5 Pro ($1.25/$5). These are rough estimates, not production data.
Cost-Aware Prompting: Tell Your Agent Its Budget
function buildSystemPrompt(remainingTokenBudget: number): string {
const isLow = remainingTokenBudget < 10_000;
return `
You are a document-processing agent.
Token budget status: ${isLow ? "CRITICAL" : "normal"}
Remaining token budget: ${remainingTokenBudget.toLocaleString()} tokens
${isLow ? `
IMPORTANT: Your token budget is nearly exhausted.
Finish the current task or return a partial result.
DO NOT start new tool calls that cost more than 2,000 tokens.
` : ""}
Available tools: [...]
`.trim();
}
It sounds simple–but it works. LLMs really do optimize differently when they know what"s left in the tank–just like you"d prioritize differently if you had only two hours left in your sprint.
"I just processed 140,400,000 tokens in 48 hours. Raw API bill: $1,677.82. My actual cost: $50.00."
– @ziwenxu_ on X
That"s what happens when you combine prompt caching and architectural cost control–cost-aware termination makes it possible. Anthropic"s prompt caching, for instance, can cut input costs by up to 90% if your system prompt stays stable.
For comparison, Jason Calacanis reports $300 per agent per day at only 10–20% utilization–projected at ~$100,000 per year per agent. The difference isn"t the model. It"s the architecture.
Decision Tree: Which Mechanism for Which Use Case?
Here"s where the rubber meets the road.
You need to match your loop-prevention mechanism to your use case.
The table below gives you the tradeoffs for each approach–across loop types, implementation effort, token overhead, and when you should use it.
| Mechanism | Loop Types Stopped | Implementation Effort | Token Overhead | Recommended For |
|---|---|---|---|---|
| Hard Limits (Recursion + Token Budget) | All loop types (as safety net) | ~30 min (if call path is known) | Minimal (<0.1%) | Every agent |
| Structured Output Exit Signal | Tool Error, Reasoning Loops | 1–3 hrs (schema + orchestrator) | ~2–5% (JSON in prompt) | 2+ tool types |
| State Machine (LangGraph) | Cross-tool, Reasoning Loops | 4–8 hrs (graph, edge logic) | ~5–10% (state) | 3+ tools, production |
| Tool-Call Deduplication | Tool Error, Cross-tool Loops | 2–4 hrs (cache + hash layer) | Negative (saves calls) | High-volume, MCP |
| Cost-Aware Termination | All, especially Reasoning Loops | 1–2 hrs (budget + prompt) | Minimal (prompt tweak) | Production must-have |
Token Cost Table by Loop Type
| Loop Type | 10 Iterations | 50 Iterations | 100 Iterations |
|---|---|---|---|
| Tool Error Loop | Claude 3.5 Sonnet: ~$0.08 / GPT-4o: ~$0.06 / Gemini 1.5 Pro: ~$0.03 | ~$0.40 / ~$0.30 / ~$0.15 | ~$0.80 / ~$0.60 / ~$0.30 |
| Reasoning Loop | ~$1.20 / ~$0.90 / ~$0.45 | ~$25 / ~$18 / ~$9 | ~$90 / ~$65 / ~$30 |
| Cross-Tool Loop | ~$2.50 / ~$2.00 / ~$1.00 | ~$60 / ~$45 / ~$22 | unlimited* |
*Estimates as of March 2026, public price lists. No hard limit = theoretically unlimited.
Recommendations by Use Case:
- Single tool, simple logic: Hard limit is enough (recursion + token budget)
- Multi-tool, clear order: State machine with LangGraph
- Multi-tool, dynamic planning: Structured output exit signal + deduplication
- High-volume production: Cost-aware termination is mandatory; layer in the others below it
- Every production agent: Use all 5 mechanisms (defense-in-depth)
Why all 5?
In a 4-stage multi-agent system with 95% accuracy per stage, total system reliability drops to just 81%. Each mechanism stops a different kind of loop. They"re not interchangeable.
And here"s an uncomfortable truth from multi-agent research:
"Researchers planted a single bad actor inside a group of LLM agents. The whole network failed to reach consensus. This is the Byzantine Generals Problem [...] the practical implication is uncomfortable for anyone building multi-agent systems."
– @rryssf_ on X
Loop prevention isn"t just about saving money. It"s about protecting the integrity of your whole system.
Checklist: Is Your Agent Ready for Production?
- Hard limits set (recursion & token budget)
- Structured output for all agent responses
- State machine or explicit control flow for 3+ tools
- Deduplication layer for tool calls
- Cost-aware prompting and real-time budget tracking
- Fallback handlers for all hard stops
- Regression tests for agent logic
- Audit trail for all state transitions
If you can"t tick every box, you"re not ready for production–and you"re risking much more than your next cloud bill.
The Bottom Line: Build for Loops, Not Against Them
Ready to build AI agents that are robust and cost-effective? SwiftRun.ai offers built-in loop prevention and cost monitoring so you can deploy with confidence. Start your free trial today – no credit card required.
Further reading: [Debugging AI Agents in Production](https://the platform/blog/ai-agent-debugging-production) – for when your loop prevention kicks in, but you have no idea why. Or: [When a Pipeline Beats an Agent](https://the platform/blog/ai-pipeline-vs-agent-unterschied) – because sometimes, the right fix isn"t a better agent, but no agent at all.
Also worth a look:
- [What Does "Human-in-the-Loop" Actually Mean?](https://the platform/blog/human-in-the-loop-ai-agenten)
- How to Debug an AI Agent Returning Wrong Results
- AI Pipeline vs. AI Agent: What"s the Difference?
Author: Georg Singer
You"re not building against loops. You"re building for resilience.
Make your next AI agent bulletproof–before you find out what infinite really costs.
Related Articles:
- How to Get AI Agents Reliably Into Production (Without Burning $47,000 on a Single Loop)
- Do You Really Need an AI Agent Platform – Or Is Direct Claude/OpenAI API Enough?
- How to Build AI Automations That Don"t Start Hallucinating After Three Days
Ready to supercharge your AI agent's capabilities? Start building intelligent tools that operate autonomously today by checking out SwiftRun.ai!
Related Articles

Connect AI Agent to Internal Database Securely
Anthropic"s official PostgreSQL-MCP server had a SQL injection flaw. Here are five architectural moves to protect any AI agent with database access–so you"re not the next incident headline.

AI Automations for SaaS: High ROI for Small Teams
Most SaaS teams see zero ROI from GenAI–not because AI itself fails, but because they automate the wrong processes. Only four automation types have proven financial impact. Everything else is just burning budget.

What Does a Self-Hosted AI Agent Platform Really Cost Each Month?
Server bills for self-hosted AI agent platforms can be as low as €35 or as high as €1,400 per month–but the real costs are 5x to 10x higher once you add engineering time. If you only compare server invoices, you're missing the true picture. Here"s a detailed breakdown, TCO calculation, and...