← Back to Home

Building AI Agents With Tool Use — From Zero to Working Code

See how AI agents think in loops, call tools, and make decisions — with animated trace views, expandable tool definitions, and a clear architecture you can build today.

Building AI Agents With Tool Use — From Zero to Working Code

An LLM that can call functions is an agent.

That’s it. That’s the whole concept. A regular LLM takes text in and gives text out. An AI agent takes text in, decides which tools to use, calls them, reads the results, and generates an answer from real data. No magic. Just a loop with function calls.


1. The Loop: How Agents Think

Every agent runs the same cycle: Think about what to do. Act by calling a tool. Observe the result. Decide if you’re done or need another loop. This is the ReAct pattern — and it’s the foundation of every agent framework.

The Agent Loop: Think → Act → Observe → Repeat

Every AI agent follows this cycle. The loop is the intelligence.

AGENT LOOP repeats until done THINK Plan next action ACT Call a tool OBSERVE Check result DECIDE Done or loop?
1 Think: "I need to look up the user's order status. I'll call the orders API."
2 Act: Calls get_order_status(order_id="12345")
3 Observe: API returns "shipped, tracking: UPS1234, arrives Thursday"
4 Decide: "I have what I need. Format the answer and respond."

The pink dot orbiting the diagram? That’s your agent. Each pass through the loop makes the answer more accurate. A simple question might take one loop. A complex task might take five. The agent decides when it’s done.


2. The Tool Box: What Agents Can Call

Tools are just functions with descriptions. You define them. The LLM reads the descriptions and picks the right one based on the user’s question. Click each tool to see its schema.

The Tool Box: What Agents Can Call

Tools are just functions. The agent picks which one to use based on the task.

Search Tool RAG, docs, knowledge base

Searches your vector database or knowledge base. Returns relevant chunks. The most common tool type — almost every agent has one.

{searchTool}
API Tool REST calls, external services

Calls external APIs — order lookup, weather, CRM, payments. The agent gets real-time data it couldn't possibly know from training.

{apiTool}
Code Execution Python, JavaScript, calculations

Runs code in a sandboxed environment. For math, data processing, chart generation — anything the LLM can write but shouldn't hallucinate.

{codeTool}
Database Query SQL, read-only access

Runs read-only SQL against your database. The agent generates the query from natural language. Always enforce read-only permissions.

{dbTool}

The key insight: tool descriptions matter more than tool code. If the description is vague, the agent picks the wrong tool. Write descriptions like you’re explaining it to a smart coworker who’s never seen the codebase.


3. Watch an Agent Execute: Full Trace

Here’s a real execution trace of an agent handling “What’s the status of my order?” Every thought, every tool call, every decision — timestamped and logged.

Agent Trace: Watch It Think

Real execution trace showing every thought, tool call, and decision.

0ms
THINK
User wants to know their order status. I need the order ID. Let me check the conversation — they said order #ORD-7891.
120ms
TOOL CALL
get_order_status(order_id="ORD-7891")
340ms
TOOL RESULT
status: "in_transit", carrier: "UPS", tracking: "1Z999AA...", eta: "2026-05-15"
360ms
THINK
Got the status. It's in transit with UPS. ETA is May 15th. Let me also check if there are any delivery exceptions.
380ms
TOOL CALL
get_tracking_details(tracking="1Z999AA...")
580ms
TOOL RESULT
exceptions: none, last_scan: "Memphis, TN — 6:42am"
600ms
DECIDE
I have everything I need. No exceptions. Will format a clear response with status, tracking link, and ETA.
620ms
RESPOND
Your order ORD-7891 is in transit via UPS. It was last scanned in Memphis, TN this morning. Expected delivery: May 15th. No delays detected.
620msTotal time
2Tool calls
2Think steps
$0.003Cost

Notice the two tool calls happened back-to-back. The agent didn’t wait for a human — it decided on its own that it needed tracking details after getting the order status. That’s the intelligence. Not any single tool call, but the sequence of decisions.


4. The Architecture: Three Layers

Every production agent has three layers. The brain (LLM) decides. The hands (tools) execute. The guardrails (safety) prevent damage. Skip any layer and you’ll regret it.

The Architecture: 3 Layers of Every Agent

Brain picks the action. Hands do the work. Guardrails keep it safe.

🧠
The Brain — LLM + System Prompt Decides what to do next
System prompt with role + rules Conversation history Tool definitions (function schemas) ReAct reasoning pattern
🔧
The Hands — Tools + APIs Executes actions in the real world
🔍 Search
🌐 API Call
💻 Code Run
🗄️ Database
📧 Email
📁 Files
🛡️
The Guardrails — Safety + Limits Prevents damage and runaway costs
Max loop iterations (prevent infinite loops) Token budget per request Tool permission allowlist Human-in-the-loop for destructive actions Output validation and content filtering

The guardrails layer is non-negotiable. Without it, an agent can loop forever (burning tokens), call destructive APIs, or generate harmful output. Max iterations, token budgets, and tool allowlists are the minimum.


5. When Agents Are Worth It — And When They’re Not

Agents add latency and cost. Every tool call is another API round-trip. Every think step burns tokens. If the task doesn’t need external data or multiple steps, a direct LLM call is better.

Agent vs Direct LLM — When Agents Win

Agents add overhead. Here's when they're worth it — and when they're not.

Agent Wins
Multi-step data lookup
Direct
25%
Agent
94%
Complex calculations
Direct
42%
Agent
98%
Real-time data access
Direct
0%
Agent
91%
Direct LLM Wins
Simple Q&A
Direct
92%
Agent
90%
Creative writing
Direct
88%
Agent
85%
Text summarization
Direct
95%
Agent
93%
The rule: If the task needs external data or multiple steps, use an agent. If the task is pure language (writing, summarizing, translating), a direct LLM call is faster, cheaper, and just as good.

The sweet spot: tasks where the agent needs to look things up, combine data from multiple sources, or take actions. Order tracking, data analysis, multi-step workflows — that’s where agents shine. For simple Q&A or writing tasks, skip the agent overhead.