Skip to content

Runtime Controls

Runtime controls are options passed to turn() / TurnAsync(). They are not .prompty frontmatter fields. This keeps the prompt portable while letting the host application decide policy, budgets, cancellation, observability, and live steering.

The prompt is prepared before these per-iteration controls run. Steering, compaction, and tool result messages mutate the in-memory message array for the current turn; they do not re-render the original template.

During each loop iteration Prompty applies controls in this order:

  1. Check cancellation.
  2. Drain steering messages.
  3. Trim messages to the context budget.
  4. Apply compaction if messages were trimmed and a compaction strategy exists.
  5. Run input guardrails.
  6. Check cancellation again before the LLM call.
  7. Call the LLM, with retry policy.
  8. Run output guardrails on model text.
  9. Run tool guardrails for each requested tool call.
  10. Execute tools, serially or in parallel.
  11. Append provider-formatted tool result messages.
  • Events & Cancellation covers observability callbacks, token/status updates, and cooperative stop behavior.
  • Context & Compaction covers trimming the message array and replacing dropped turns with summaries.
  • Guardrails covers input, output, and tool policy hooks.
  • Steering covers injecting guidance between loop iterations.
  • Tool Execution covers serial vs. parallel tool dispatch, tool failures, max iterations, and LLM retries.
from prompty import turn
from prompty.core import CancellationToken, Guardrails
token = CancellationToken()
result = turn(
agent,
inputs={"question": question},
tools=tools,
on_event=lambda event_type, data: print(event_type, data),
cancel=token,
context_budget=50_000,
compaction=lambda dropped: summarize_locally(dropped),
guardrails=Guardrails(
input=lambda messages: check_input(messages),
output=lambda message: check_output(message),
tool=lambda name, args: check_tool(name, args),
),
steering=steering,
parallel_tool_calls=True,
max_llm_retries=3,
)

Start small:

  1. Use events first so you can observe the loop.
  2. Add cancellation if the turn can take more than a few seconds.
  3. Add context budget and compaction when conversations can grow.
  4. Add guardrails when host policy needs to enforce allow/deny/rewrite behavior.
  5. Add steering only when a user or operator needs to intervene mid-turn.