§3 Tracing
Tracing is the cross-cutting observability layer that wraps every pipeline stage.
It is not optional: conformant implementations MUST implement the core tracing
infrastructure. Individual backends beyond the .tracy file backend are SHOULD/MAY.
§3.1 Tracer Registry
Section titled “§3.1 Tracer Registry”The tracer registry manages named tracing backends. It MUST be thread-safe.
Operations:
| Operation | Description |
|---|---|
add(name, factory) | Register a named backend factory |
remove(name) | Unregister a named backend |
clear() | Remove all registered backends |
start(spanName) → emitter | Open ALL registered backends, return a fan-out emitter |
Factory signature:
factory: (spanName: string) → ContextManager<(key: string, value: any) → void>A factory receives a span name and returns a context manager. On enter, the context
manager yields an emitter function (key, value) → void. On exit, the span ends.
Fan-out: start(spanName) MUST invoke every registered factory and return a
composite emitter that forwards every (key, value) call to all active backends
simultaneously. If any individual backend raises an error, it MUST NOT prevent other
backends from receiving the emission.
Thread safety: The registry MUST be safe for concurrent add/remove/start calls
from multiple threads or async tasks.
§3.2 The @trace Decorator
Section titled “§3.2 The @trace Decorator”The @trace decorator (or equivalent wrapper mechanism) instruments a function with
automatic span creation, input capture, and result capture.
Algorithm:
function trace(fn, options?): return wrapped_fn(*args, **kwargs): 1. signature ← module_name + "." + qualified_name(fn) 2. bound_args ← bind function parameters to (args, kwargs) 3. Remove 'self' from bound_args (if present) 4. If options.ignore_params specified: Remove those parameter names from bound_args 5. emitter ← Tracer.start(signature) 6. emitter("signature", signature) 7. emitter("inputs", {name: to_dict(value) for name, value in bound_args}) 8. try: result ← call fn(*args, **kwargs) emitter("result", to_dict(result)) return result catch error: emitter("result", { "exception": type_name(error), "message": str(error), "traceback": format_traceback(error) }) re-raise error finally: end span (exit context manager)Requirements:
- MUST work on async functions (awaiting the result within the span).
- SHOULD work on sync functions.
- SHOULD support custom attributes via decorator arguments or options.
- MUST NOT alter the return value or exception behavior of the wrapped function.
- If no backends are registered, the decorator MUST still call the original function (tracing becomes a no-op, not a failure).
§3.3 Serialization (to_dict)
Section titled “§3.3 Serialization (to_dict)”Before emitting values to backends, all values MUST be serialized to a JSON-safe representation.
Conversion rules (applied recursively):
| Input Type | Output |
|---|---|
string | Passthrough |
int, float, bool | Passthrough |
null / None | null |
datetime | ISO 8601 string (e.g., "2026-04-04T12:00:00Z") |
dataclass / model | Dict representation (field → value) |
Path / file path | String representation |
list / array | Recursive to_dict on each element |
dict / map | Recursive to_dict on each value |
| Everything else | str(value) (string coercion) |
§3.4 Redaction
Section titled “§3.4 Redaction”Before emitting values to any backend, implementations MUST redact sensitive data.
Sensitive key detection: A key is sensitive if it contains any of the following substrings (case-insensitive match):
secretpasswordapi_keyapikeytokenauthcredentialcookie
Redacted value: "[REDACTED]" (or language-idiomatic equivalent).
Scope: Redaction applies recursively to all dict keys at every nesting level in the
serialized inputs and result values. Redaction MUST occur before the value reaches
any backend — including the .tracy file backend.
§3.5 Usage Hoisting
Section titled “§3.5 Usage Hoisting”Token usage information MUST be propagated from child spans to parent spans.
Algorithm:
function hoist_usage(span): If span.result contains a "usage" field: usage ← span.result.usage propagate to parent span as "__usage": __usage.prompt_tokens += usage.prompt_tokens (or usage.input_tokens) __usage.completion_tokens += usage.completion_tokens (or usage.output_tokens) __usage.total_tokens += usage.total_tokens For each child in span.__frames: hoist_usage(child) // recursive — deepest spans propagate firstUsage hoisting enables the root span to report aggregate token consumption across the entire pipeline execution, including agent loops with multiple LLM calls.
§3.6 Built-in Backends
Section titled “§3.6 Built-in Backends”§3.6.1 .tracy File Backend — MUST Implement
Section titled “§3.6.1 .tracy File Backend — MUST Implement”Every conformant implementation MUST provide a .tracy file backend that writes
structured JSON traces to disk.
File naming: <sanitized_span_name>.<YYYYMMDD.HHMMSS>.tracy
sanitized_span_name: the root span name with filesystem-unsafe characters replaced by underscores.- Timestamp: UTC time when the root span ends.
File is written only when the root span ends — intermediate spans accumulate in memory.
JSON schema:
{ "runtime": "<language_name>", "version": "<prompty_library_version>", "trace": { "name": "<span_name>", "__time": { "start": "<ISO8601_UTC>", "end": "<ISO8601_UTC>", "duration": "<milliseconds_as_number>" }, "signature": "<module.function_name>", "inputs": { "<param_name>": "<serialized_value>", "..." : "..." }, "result": "<serialized_value>", "__frames": [ { "name": "<child_span_name>", "__time": { "..." : "..." }, "signature": "...", "inputs": { "..." : "..." }, "result": "...", "__frames": [] } ], "__usage": { "prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0 } }}Field definitions:
| Field | Type | Description |
|---|---|---|
runtime | string | Language/runtime name (e.g., "python", "csharp", "javascript") |
version | string | Prompty library version (e.g., "2.0.0") |
name | string | Span name |
__time | object | Timing information |
signature | string | Fully-qualified function signature |
inputs | object | Serialized input parameters (redacted) |
result | any | Serialized return value (redacted) |
__frames | array | Child spans (recursive, same structure) |
__usage | object | Aggregated token usage (hoisted from children) |
The __ prefix on internal fields (__time, __frames, __usage) distinguishes them
from user-emitted keys.
§3.6.2 Console Tracer — SHOULD Implement
Section titled “§3.6.2 Console Tracer — SHOULD Implement”Implementations SHOULD provide a console tracer that prints span start/end events to stderr (or the platform-equivalent diagnostic output).
Output format (RECOMMENDED):
[prompty] ▶ <span_name>[prompty] ◀ <span_name> (<duration_ms>ms)§3.6.3 OpenTelemetry Backend — SHOULD Implement
Section titled “§3.6.3 OpenTelemetry Backend — SHOULD Implement”Implementations SHOULD provide an OpenTelemetry tracing backend that conforms to the OpenTelemetry GenAI Semantic Conventions.
Span attributes mapping:
| Prompty Concept | OTel Attribute | Value |
|---|---|---|
| API type | gen_ai.operation.name | "chat", "embeddings", "invoke_agent", "execute_tool" |
| Provider name | gen_ai.provider.name | "openai", "microsoft.foundry", "anthropic" |
| Model ID | gen_ai.request.model | From agent.model.id |
| Temperature | gen_ai.request.temperature | From agent.model.options.temperature |
| Max tokens | gen_ai.request.max_tokens | From agent.model.options.maxOutputTokens |
| Input token count | gen_ai.usage.input_tokens | From response usage |
| Output token count | gen_ai.usage.output_tokens | From response usage |
| Finish reasons | gen_ai.response.finish_reasons | From response choices |
| Response ID | gen_ai.response.id | From response |
| Response model | gen_ai.response.model | From response |
Span events (opt-in — implementations MAY emit these):
| Event Name | Content |
|---|---|
gen_ai.input.messages | Messages sent to LLM |
gen_ai.output.messages | Messages received from LLM |
gen_ai.system_instructions | System message content |
gen_ai.tool.definitions | Tool definitions sent to LLM |
Span kinds:
CLIENTfor remote LLM API calls (executor).INTERNALfor local processing (renderer, parser, processor).
Agent and tool spans:
- Agent loop spans:
gen_ai.operation.name = "invoke_agent", withgen_ai.agent.name. - Tool execution spans:
gen_ai.operation.name = "execute_tool".
Error handling: On error, set span status to ERROR and record error.type attribute.
§3.7 Extensibility
Section titled “§3.7 Extensibility”Users add custom backends by implementing the factory interface and calling
add(name, factory) on the tracer registry.
Example use cases:
- Database logging (write spans to a SQL or NoSQL store)
- Webhook notifications (POST span data to an HTTP endpoint)
- Custom dashboards (stream span data to a visualization tool)
- Cost tracking (aggregate
__usageacross runs)
Backends MUST NOT block the pipeline — if a backend is slow, it SHOULD buffer or flush asynchronously. A failing backend MUST NOT cause the pipeline to fail.