Skip to content

Guardrails

Guardrails are host-enforced policy hooks. They run inside the agent loop, outside the model prompt, so they are appropriate for behavior that must be enforced by your application rather than merely requested in instructions.

HookRunsCan do
InputAfter steering and context trimming, before the LLM callallow, deny, rewrite messages
OutputAfter model text, before returning or continuingallow, deny, rewrite output
ToolBefore each requested tool executesallow, deny, rewrite arguments

In an agent turn, guardrails can run repeatedly:

prepare/thread expansion
→ steering
→ context trimming/compaction
→ input guardrail
→ LLM call
→ process response
→ tool guardrail(s)
→ tool result messages
→ repeat
→ final output guardrail

Input and output denials fail the turn. In current runtimes, a denied tool call is not executed. The runtime appends a synthetic tool-result message, for example Tool denied by guardrail: ..., so the model can recover or choose another path.

Rewrite payload shape depends on the hook. Input rewrites replace the message array, tool rewrites replace the parsed argument object, and output rewrites replace the value returned to the caller. Some runtimes expose these shapes as plain objects, while C# uses List<Message> for input rewrites and Dictionary<string, object?> for tool rewrites.

  • redact secrets before a model call;
  • block requests that violate app policy;
  • constrain high-impact tools;
  • rewrite risky tool arguments;
  • validate final output before returning it to users.
UseWhen
Prompt instructionsBehavior is advisory and can be model-mediated
GuardrailsPolicy must be enforced by the host application at runtime
Tool authorizationAccess control, tenant/resource permissions, writes, and side effects

Guardrails are runtime options passed to turn() / TurnAsync(). Do not declare them in .prompty frontmatter.

from prompty import turn
from prompty.core import GuardrailResult, Guardrails
def check_tool(name: str, args: dict) -> GuardrailResult:
if name == "delete_file":
return GuardrailResult(allowed=False, reason="File deletion is disabled")
return GuardrailResult(allowed=True)
result = turn(
agent,
inputs={"question": question},
tools=tools,
guardrails=Guardrails(tool=check_tool),
)

Prefer rewrite when the request can be made safe without changing user intent:

  • remove a secret from an input message;
  • clamp a tool argument to an allowed range;
  • normalize an output shape.

Prefer deny when continuing would be misleading or unsafe:

  • user asks for an operation they are not allowed to perform;
  • a tool call would mutate protected state;
  • output violates a hard policy.

Rewrite values must match the hook:

  • input rewrite: a replacement message list;
  • output rewrite: replacement assistant content or final result;
  • tool rewrite: a replacement argument object/dictionary, not a different tool name.

Guardrails are not declared in .prompty frontmatter. The prompt can describe desired behavior, but the host application passes guardrails at runtime. This lets the same prompt run under different policies in development, test, and production.