Skip to content

Context & Compaction

Agent turns can grow as the model calls tools and Prompty appends results. Within a single turn() call, a context budget lets the runtime trim the in-memory message array before each model call in the agent loop.

When the estimated message size exceeds the budget, Prompty tries to trim older non-system messages. If at least one message is dropped, it:

  1. keeps leading system messages;
  2. keeps the newest conversation/tool messages;
  3. drops older non-system messages;
  4. inserts a summary message for the dropped messages.

The budget is character-based in the current runtimes. Treat it as a practical guardrail, not a tokenizer-perfect substitute for provider-specific limits. Non-text parts and tool metadata are estimated, so leave headroom below provider context limits.

Without custom compaction, Prompty generates a simple summary from dropped messages and inserts it near the front of the conversation. This preserves a trace of what happened while freeing context for recent work.

Custom compaction replaces the default summary. It can be:

  • a function that receives dropped messages and returns a summary; or
  • a .prompty summarizer prompt that receives formatted dropped messages.

Compaction only runs when trimming drops at least one message. If compaction returns an empty summary or fails, Prompty keeps the built-in summary.

result = turn(
agent,
inputs={"question": question},
tools=tools,
context_budget=50_000,
compaction=lambda dropped: summarize_dropped_messages(dropped),
)

A compaction prompt should summarize operationally useful state, not every token. Good summaries include:

  • user goals and constraints;
  • decisions already made;
  • tool calls that changed state;
  • unresolved questions;
  • identifiers needed later.

Avoid summarizing transient tool output that will not matter later. Preserve tool calls and results that changed external state. Summarize or omit read-only diagnostic output unless it affects later decisions.

Context trimming happens inside one turn() call. Conversation history across external user turns is still the host application’s responsibility, usually via a kind: thread input. When the next external turn starts, Prompty prepares the prompt again and expands that thread input.

On the next external user turn, load prior messages from your store and pass them as the conversation / kind: thread input. Prompty does not automatically remember previous turn() calls.

Do not use compaction as a persistence layer. Use compaction mainly for long tool loops or large intermediate tool results within one turn.