Agent Mode
Overview
Section titled “Overview”Agent mode provides an automatic tool-calling loop. Instead of returning a single completion, the LLM can request one or more tool calls — function invocations that Prompty executes on its behalf. The results are appended to the conversation and the LLM is called again. This cycle repeats until the model produces a final text response (or a safety limit is hit).
This lets you build agents that can query databases, call APIs, search files, or perform any action you expose as a Python function — all driven by the LLM’s reasoning.
flowchart TD
A["Send Messages to LLM"] --> B["Receive Response"]
B --> C{"has tool_calls?\n(finish_reason)"}
C -- Yes --> D["Execute Tool Functions"]
D --> E["Append Tool Results"]
E -.-> A
C -- No --> F["Return Final Response"]
G["Error Paths"] ~~~ H
H["Bad JSON in tool args"] --> I["Send error string to LLM"]
J["Tool function throws"] --> K["Send error string to LLM"]
L["Max iterations exceeded"] --> M["Raise ValueError"]
style A fill:#3b82f6,stroke:#1d4ed8,color:#fff
style B fill:#3b82f6,stroke:#1d4ed8,color:#fff
style C fill:#f59e0b,stroke:#d97706,color:#fff
style D fill:#10b981,stroke:#059669,color:#fff
style E fill:#10b981,stroke:#059669,color:#fff
style F fill:#1d4ed8,stroke:#1e3a8a,color:#fff
style G fill:none,stroke:none,color:#6b7280
style H fill:#fef2f2,stroke:#ef4444,color:#ef4444
style I fill:#fef2f2,stroke:#ef4444,color:#ef4444
style J fill:#fef2f2,stroke:#ef4444,color:#ef4444
style K fill:#fef2f2,stroke:#ef4444,color:#ef4444
style L fill:#fef2f2,stroke:#ef4444,color:#ef4444
style M fill:#fef2f2,stroke:#ef4444,color:#ef4444
Basic Usage
Section titled “Basic Usage”Define one or more Python functions, then pass them to execute_agent() along
with a loaded agent. The executor calls the LLM, dispatches any tool requests to
your functions, and loops until the model is done.
import prompty
# 1. Define tool functionsdef get_weather(city: str) -> str: """Get the current weather for a city.""" # In reality this would call an API return f"72°F and sunny in {city}"
def get_time(timezone: str) -> str: """Get the current time in a timezone.""" return f"3:42 PM in {timezone}"
# 2. Load the agent promptagent = prompty.load("agent.prompty")
# 3. Run the agent loopresult = prompty.execute_agent( agent, inputs={"question": "What's the weather in Seattle?"}, tools={"get_weather": get_weather, "get_time": get_time}, max_iterations=10,)
print(result) # "It's currently 72°F and sunny in Seattle!"The .prompty File
Section titled “The .prompty File”Agent prompts declare their tools in the frontmatter using FunctionTool
entries. The LLM sees these as available functions it can call.
---name: weather-agentdescription: An agent that can check weather and timemodel: id: gpt-4o provider: openai apiType: chat connection: kind: key endpoint: ${env:OPENAI_API_ENDPOINT:https://api.openai.com/v1} apiKey: ${env:OPENAI_API_KEY} options: temperature: 0inputSchema: properties: question: kind: string description: The user's question default: What's the weather?tools: - name: get_weather kind: function description: Get the current weather for a city parameters: properties: - name: city kind: string description: City name, e.g. "Seattle" required: true strict: true - name: get_time kind: function description: Get the current time in a timezone parameters: properties: - name: timezone kind: string description: IANA timezone, e.g. "America/Los_Angeles" required: truetemplate: format: kind: jinja2 parser: kind: prompty---system:You are a helpful assistant with access to weather and time tools.Answer the user's question using the available tools.
user:{{question}}Async Agent Mode
Section titled “Async Agent Mode”For async applications, use execute_agent_async(). Your tool functions can be
either sync or async — the executor detects coroutine functions automatically
and awaits them.
import asyncioimport prompty
async def get_weather(city: str) -> str: """Async weather lookup.""" # Imagine an async HTTP call here return f"72°F and sunny in {city}"
async def main(): agent = await prompty.load_async("agent.prompty") result = await prompty.execute_agent_async( agent, inputs={"question": "Weather in Tokyo?"}, tools={"get_weather": get_weather}, max_iterations=10, ) print(result)
asyncio.run(main())Error Recovery
Section titled “Error Recovery”The agent loop is designed to be resilient. Instead of crashing on tool execution errors, it feeds error information back to the LLM so the model can retry or adjust its approach.
Bad JSON in Tool Arguments
Section titled “Bad JSON in Tool Arguments”If the LLM returns malformed JSON in a tool call’s arguments field, Prompty
catches the json.JSONDecodeError and sends the error string back as the tool
result. The model typically corrects the JSON on the next attempt.
tool message → "Error parsing arguments: Expecting ',' delimiter: line 1 column 42"Tool Function Throws an Exception
Section titled “Tool Function Throws an Exception”If your tool function raises any exception, Prompty catches it and sends the error message back to the LLM as the tool result. This lets the model decide whether to retry with different arguments or inform the user.
tool message → "Error executing get_weather: ConnectionTimeout: API unreachable"Missing Tool Name
Section titled “Missing Tool Name”If the LLM requests a tool that doesn’t exist in the tools dict, an error
message is returned instead of crashing:
tool message → "Error: tool 'unknown_tool' not found in tools dict"Max Iterations Exceeded
Section titled “Max Iterations Exceeded”If the loop runs for more than max_iterations cycles without the model
producing a final response, a ValueError is raised. This prevents infinite
loops when the model gets stuck in a tool-calling cycle.
try: result = prompty.execute_agent(agent, inputs, tools, max_iterations=5)except ValueError as e: print(e) # "Agent loop exceeded max_iterations (5)"TypeScript Example
Section titled “TypeScript Example”import { load, executeAgent } from "prompty";
function getWeather(city: string): string { return `72°F and sunny in ${city}`;}
const agent = await load("agent.prompty");const result = await executeAgent(agent, { inputs: { question: "What's the weather in London?" }, tools: { get_weather: getWeather }, maxIterations: 10,});
console.log(result);How It Works Internally
Section titled “How It Works Internally”Under the hood, the agent loop in the executor follows these steps:
-
Collect the full response — the agent loop works with both streaming and non-streaming requests. When streaming is enabled, the loop consumes the stream and accumulates tool calls from the streamed chunks. When streaming is off, it reads tool calls directly from the response. Either way, tool calls are fully collected before any are executed.
-
Call the LLM — send the current message list plus tool definitions via the chat completions API.
-
Check
finish_reason— if the response’sfinish_reasonis"tool_calls", the model wants to invoke tools. If it’s"stop", the model is done. -
Extract tool calls — each tool call has an
id, afunction.name, andfunction.arguments(a JSON string). -
Look up & execute — for each tool call, find the matching function in the
toolsdict (oragent.metadata["tool_functions"]), parse the arguments, and call the function. -
Append results — add the assistant’s tool-call message and one
toolrole message per call result back to the conversation. -
Repeat — go back to step 2 with the updated message list.
-
Return — when the model produces a final response (no tool calls), pass it through the processor and return the result.
# Simplified pseudocode of the agent loopmessages = prepare(agent, inputs)for i in range(max_iterations): response = client.chat.completions.create( model=agent.model.id, messages=messages, tools=tool_definitions, ) if response.finish_reason != "tool_calls": return process(response)
# Execute each tool call messages.append(response.message) for tool_call in response.tool_calls: result = tools[tool_call.function.name](**tool_call.args) messages.append({"role": "tool", "tool_call_id": tool_call.id, "content": result})
raise ValueError(f"Agent loop exceeded max_iterations ({max_iterations})")-
Keep tool descriptions clear and concise. The LLM uses the
descriptionfield to decide when to call a tool. Vague descriptions lead to incorrect or missed tool calls. -
Use
strict: trueonFunctionTool. This enables OpenAI’s structured output mode for tool parameters, ensuring the model produces valid JSON matching your schema. It requires all parameters to berequiredand addsadditionalProperties: falseautomatically. -
Set a reasonable
max_iterations. Most tool-using conversations complete in 2–5 iterations. Setting the limit too high risks runaway costs; setting it too low may cut off legitimate multi-step reasoning. -
Return structured strings from tools. The LLM processes your tool’s return value as text. Returning well-formatted data (JSON, key-value pairs) helps the model extract information accurately.
-
Test with mocked tools first. Use simple stub functions that return hardcoded data while developing your prompt. Switch to real implementations once the agent’s reasoning flow is solid.