Skip to content

§3 Tracing

Tracing is the cross-cutting observability layer that wraps every pipeline stage. It is not optional: conformant implementations MUST implement the core tracing infrastructure. Individual backends beyond the .tracy file backend are SHOULD/MAY.

The tracer registry manages named tracing backends. It MUST be thread-safe.

Operations:

OperationDescription
add(name, factory)Register a named backend factory
remove(name)Unregister a named backend
clear()Remove all registered backends
start(spanName) → emitterOpen ALL registered backends, return a fan-out emitter

Factory signature:

factory: (spanName: string) → ContextManager<(key: string, value: any) → void>

A factory receives a span name and returns a context manager. On enter, the context manager yields an emitter function (key, value) → void. On exit, the span ends.

Fan-out: start(spanName) MUST invoke every registered factory and return a composite emitter that forwards every (key, value) call to all active backends simultaneously. If any individual backend raises an error, it MUST NOT prevent other backends from receiving the emission.

Thread safety: The registry MUST be safe for concurrent add/remove/start calls from multiple threads or async tasks.

The @trace decorator (or equivalent wrapper mechanism) instruments a function with automatic span creation, input capture, and result capture.

Algorithm:

function trace(fn, options?):
return wrapped_fn(*args, **kwargs):
1. signature ← module_name + "." + qualified_name(fn)
2. bound_args ← bind function parameters to (args, kwargs)
3. Remove 'self' from bound_args (if present)
4. If options.ignore_params specified:
Remove those parameter names from bound_args
5. emitter ← Tracer.start(signature)
6. emitter("signature", signature)
7. emitter("inputs", {name: to_dict(value) for name, value in bound_args})
8. try:
result ← call fn(*args, **kwargs)
emitter("result", to_dict(result))
return result
catch error:
emitter("result", {
"exception": type_name(error),
"message": str(error),
"traceback": format_traceback(error)
})
re-raise error
finally:
end span (exit context manager)

Requirements:

  • MUST work on async functions (awaiting the result within the span).
  • SHOULD work on sync functions.
  • SHOULD support custom attributes via decorator arguments or options.
  • MUST NOT alter the return value or exception behavior of the wrapped function.
  • If no backends are registered, the decorator MUST still call the original function (tracing becomes a no-op, not a failure).

Before emitting values to backends, all values MUST be serialized to a JSON-safe representation.

Conversion rules (applied recursively):

Input TypeOutput
stringPassthrough
int, float, boolPassthrough
null / Nonenull
datetimeISO 8601 string (e.g., "2026-04-04T12:00:00Z")
dataclass / modelDict representation (field → value)
Path / file pathString representation
list / arrayRecursive to_dict on each element
dict / mapRecursive to_dict on each value
Everything elsestr(value) (string coercion)

Before emitting values to any backend, implementations MUST redact sensitive data.

Sensitive key detection: A key is sensitive if it contains any of the following substrings (case-insensitive match):

  • secret
  • password
  • api_key
  • apikey
  • token
  • auth
  • credential
  • cookie

Redacted value: "[REDACTED]" (or language-idiomatic equivalent).

Scope: Redaction applies recursively to all dict keys at every nesting level in the serialized inputs and result values. Redaction MUST occur before the value reaches any backend — including the .tracy file backend.

Token usage information MUST be propagated from child spans to parent spans.

Algorithm:

function hoist_usage(span):
If span.result contains a "usage" field:
usage ← span.result.usage
propagate to parent span as "__usage":
__usage.prompt_tokens += usage.prompt_tokens (or usage.input_tokens)
__usage.completion_tokens += usage.completion_tokens (or usage.output_tokens)
__usage.total_tokens += usage.total_tokens
For each child in span.__frames:
hoist_usage(child) // recursive — deepest spans propagate first

Usage hoisting enables the root span to report aggregate token consumption across the entire pipeline execution, including agent loops with multiple LLM calls.

§3.6.1 .tracy File Backend — MUST Implement

Section titled “§3.6.1 .tracy File Backend — MUST Implement”

Every conformant implementation MUST provide a .tracy file backend that writes structured JSON traces to disk.

File naming: <sanitized_span_name>.<YYYYMMDD.HHMMSS>.tracy

  • sanitized_span_name: the root span name with filesystem-unsafe characters replaced by underscores.
  • Timestamp: UTC time when the root span ends.

File is written only when the root span ends — intermediate spans accumulate in memory.

JSON schema:

{
"runtime": "<language_name>",
"version": "<prompty_library_version>",
"trace": {
"name": "<span_name>",
"__time": {
"start": "<ISO8601_UTC>",
"end": "<ISO8601_UTC>",
"duration": "<milliseconds_as_number>"
},
"signature": "<module.function_name>",
"inputs": { "<param_name>": "<serialized_value>", "..." : "..." },
"result": "<serialized_value>",
"__frames": [
{
"name": "<child_span_name>",
"__time": { "..." : "..." },
"signature": "...",
"inputs": { "..." : "..." },
"result": "...",
"__frames": []
}
],
"__usage": {
"prompt_tokens": 0,
"completion_tokens": 0,
"total_tokens": 0
}
}
}

Field definitions:

FieldTypeDescription
runtimestringLanguage/runtime name (e.g., "python", "csharp", "javascript")
versionstringPrompty library version (e.g., "2.0.0")
namestringSpan name
__timeobjectTiming information
signaturestringFully-qualified function signature
inputsobjectSerialized input parameters (redacted)
resultanySerialized return value (redacted)
__framesarrayChild spans (recursive, same structure)
__usageobjectAggregated token usage (hoisted from children)

The __ prefix on internal fields (__time, __frames, __usage) distinguishes them from user-emitted keys.

§3.6.2 Console Tracer — SHOULD Implement

Section titled “§3.6.2 Console Tracer — SHOULD Implement”

Implementations SHOULD provide a console tracer that prints span start/end events to stderr (or the platform-equivalent diagnostic output).

Output format (RECOMMENDED):

[prompty] ▶ <span_name>
[prompty] ◀ <span_name> (<duration_ms>ms)

§3.6.3 OpenTelemetry Backend — SHOULD Implement

Section titled “§3.6.3 OpenTelemetry Backend — SHOULD Implement”

Implementations SHOULD provide an OpenTelemetry tracing backend that conforms to the OpenTelemetry GenAI Semantic Conventions.

Span attributes mapping:

Prompty ConceptOTel AttributeValue
API typegen_ai.operation.name"chat", "embeddings", "invoke_agent", "execute_tool"
Provider namegen_ai.provider.name"openai", "microsoft.foundry", "anthropic"
Model IDgen_ai.request.modelFrom agent.model.id
Temperaturegen_ai.request.temperatureFrom agent.model.options.temperature
Max tokensgen_ai.request.max_tokensFrom agent.model.options.maxOutputTokens
Input token countgen_ai.usage.input_tokensFrom response usage
Output token countgen_ai.usage.output_tokensFrom response usage
Finish reasonsgen_ai.response.finish_reasonsFrom response choices
Response IDgen_ai.response.idFrom response
Response modelgen_ai.response.modelFrom response

Span events (opt-in — implementations MAY emit these):

Event NameContent
gen_ai.input.messagesMessages sent to LLM
gen_ai.output.messagesMessages received from LLM
gen_ai.system_instructionsSystem message content
gen_ai.tool.definitionsTool definitions sent to LLM

Span kinds:

  • CLIENT for remote LLM API calls (executor).
  • INTERNAL for local processing (renderer, parser, processor).

Agent and tool spans:

  • Agent loop spans: gen_ai.operation.name = "invoke_agent", with gen_ai.agent.name.
  • Tool execution spans: gen_ai.operation.name = "execute_tool".

Error handling: On error, set span status to ERROR and record error.type attribute.

Users add custom backends by implementing the factory interface and calling add(name, factory) on the tracer registry.

Example use cases:

  • Database logging (write spans to a SQL or NoSQL store)
  • Webhook notifications (POST span data to an HTTP endpoint)
  • Custom dashboards (stream span data to a visualization tool)
  • Cost tracking (aggregate __usage across runs)

Backends MUST NOT block the pipeline — if a backend is slow, it SHOULD buffer or flush asynchronously. A failing backend MUST NOT cause the pipeline to fail.