Streaming Responses
What You’ll Build
Section titled “What You’ll Build”A prompt that streams responses chunk by chunk — ideal for chat UIs where
users see tokens appear in real time. Prompty wraps the raw SDK stream in a
tracing-aware PromptyStream so you get full observability without losing
any data.
Step 1: Enable Streaming
Section titled “Step 1: Enable Streaming”Set stream: true in the model’s additionalProperties — either in the
.prompty file or at runtime.
In the .prompty File
Section titled “In the .prompty File”---name: streaming-chatdescription: A chat prompt with streaming enabledmodel: id: gpt-4o-mini provider: openai apiType: chat connection: kind: key apiKey: ${env:OPENAI_API_KEY} options: temperature: 0.7 additionalProperties: stream: trueinputSchema: properties: - name: question kind: string default: Tell me a joketemplate: format: kind: jinja2 parser: kind: prompty---system:You are a helpful assistant.
user:{{question}}At Runtime
Section titled “At Runtime”If your .prompty file doesn’t have streaming enabled, you can toggle it
before execution:
from prompty import load
agent = load("chat.prompty")agent.model.options.additionalProperties["stream"] = TrueStep 2: Consume the Stream
Section titled “Step 2: Consume the Stream”Use run() with raw=True to get the unprocessed PromptyStream, then
pass it to process() which yields text chunks.
from prompty import load, run, process
agent = load("chat.prompty")agent.model.options.additionalProperties["stream"] = True
# run() with raw=True returns the PromptyStreamstream = run(agent, inputs={"question": "Tell me a joke"}, raw=True)
# process() yields text chunksfor chunk in process(agent, stream): print(chunk, end="", flush=True)print() # newline after stream completesimport asynciofrom prompty import load_async, run_async, process_async
async def main(): agent = await load_async("chat.prompty") agent.model.options.additionalProperties["stream"] = True
stream = await run_async( agent, inputs={"question": "Tell me a joke"}, raw=True )
async for chunk in process_async(agent, stream): print(chunk, end="", flush=True) print()
asyncio.run(main())import { load, run, process } from "prompty";
const agent = await load("chat.prompty");agent.model.options.additionalProperties.stream = true;
const stream = await run(agent, { question: "Tell me a joke" }, { raw: true });
for await (const chunk of process(agent, stream)) { process.stdout.write(chunk);}console.log();Each chunk is a string — the processor extracts delta.content from
the raw API response objects so you don’t handle the wire format yourself.
How Streaming + Tracing Works
Section titled “How Streaming + Tracing Works”A common concern with streaming is losing observability — if chunks are consumed lazily, when does the trace fire?
Prompty’s PromptyStream wrapper solves this:
- The executor wraps the raw SDK iterator in a
PromptyStream. - As you iterate, each chunk is forwarded to your code and appended to an internal accumulator.
- When the iterator is exhausted (
StopIteration), the wrapper flushes the complete accumulated response to the active tracer.
iterate chunk 1 → yield + accumulateiterate chunk 2 → yield + accumulateiterate chunk 3 → yield + accumulate ...StopIteration → flush accumulated data to tracer ✓What the Processor Handles
Section titled “What the Processor Handles”The streaming processor does more than forward raw chunks:
| Scenario | Behavior |
|---|---|
| Content deltas | delta.content strings are yielded directly to the caller |
| Tool-call deltas | Argument fragments are accumulated; a complete ToolCall is yielded when the stream ends |
| Refusal | If delta.refusal is present, the processor raises a ValueError |
| Empty / heartbeat chunks | Chunks with no content or tool-call data are silently skipped |
Complete Example: Streaming Chat App
Section titled “Complete Example: Streaming Chat App”Here’s a self-contained example you can copy and run:
---name: stream-demomodel: id: gpt-4o-mini provider: openai apiType: chat connection: kind: key apiKey: ${env:OPENAI_API_KEY} options: temperature: 0.9 additionalProperties: stream: trueinputSchema: properties: - name: topic kind: string default: space explorationtemplate: format: kind: jinja2 parser: kind: prompty---system:You are a creative storyteller.
user:Write a short story about {{topic}}.from prompty import load, run, process
agent = load("stream-demo.prompty")stream = run(agent, inputs={"topic": "a robot learning to paint"}, raw=True)
print("Story: ", end="")for chunk in process(agent, stream): print(chunk, end="", flush=True)print("\n--- Done ---")Further Reading
Section titled “Further Reading”- Streaming concept — architecture details on
PromptyStreamandAsyncPromptyStream - Tracing — how traces capture streaming data