Skip to content

Tutorial: Prompts in Production

How to take a .prompty file from local development to production-ready: environment configuration, connection management, tracing, error handling, testing, versioning, and performance optimization.


Never hardcode secrets. Use ${env:VAR} references in your .prompty file and resolve them from the environment at load time.

chat.prompty
---
name: production-chat
model:
id: gpt-4o
provider: openai
connection:
kind: key
endpoint: ${env:OPENAI_ENDPOINT:https://api.openai.com/v1}
apiKey: ${env:OPENAI_API_KEY}
options:
temperature: 0.7
maxOutputTokens: 1024
---

For local development, create a .env file (add it to .gitignore):

.env
OPENAI_API_KEY=sk-your-key-here
OPENAI_ENDPOINT=https://api.openai.com/v1

For CI/CD, set environment variables in your pipeline configuration:

.github/workflows/deploy.yml
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
OPENAI_ENDPOINT: ${{ secrets.OPENAI_ENDPOINT }}

Use different connection strategies for development vs production. In dev, use kind: key with environment variables. In production, use kind: reference with a pre-configured SDK client that supports managed identity.

chat.prompty
model:
connection:
kind: key
endpoint: ${env:OPENAI_ENDPOINT:https://api.openai.com/v1}
apiKey: ${env:OPENAI_API_KEY}
chat.prompty
model:
connection:
kind: reference
name: prod-client

Register the client at application startup:

import os
import prompty
from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
client = AzureOpenAI(
azure_endpoint=os.environ["AZURE_ENDPOINT"],
azure_ad_token_provider=get_bearer_token_provider(
DefaultAzureCredential(),
"https://cognitiveservices.azure.com/.default",
),
)
prompty.register_connection("prod-client", client=client)

from prompty import Tracer
from prompty.tracing.tracer import console_tracer
Tracer.add("console", console_tracer)
from prompty import Tracer
from prompty.tracing.otel import otel_tracer
# Sends spans to your OTel collector (Jaeger, Azure Monitor, etc.)
Tracer.add("otel", otel_tracer())

Install the dependency:

Terminal window
pip install prompty[otel]

Wrap invoke() calls to handle common production failures: rate limits, timeouts, and authentication errors.

import prompty
from openai import RateLimitError, APITimeoutError, AuthenticationError
def safe_invoke(path: str, inputs: dict) -> str | None:
try:
return prompty.invoke(path, inputs=inputs)
except AuthenticationError:
print("Check your API key or connection config")
return None
except RateLimitError:
print("Rate limited — retry with backoff")
return None
except APITimeoutError:
print("Request timed out — retry or increase timeout")
return None
except FileNotFoundError:
print(f"Prompt file not found: {path}")
return None
except ValueError as e:
print(f"Invalid prompt configuration: {e}")
return None

Test loading, rendering, and parsing without making API calls:

from prompty import load, prepare
def test_prompt_loads():
agent = load("prompts/chat.prompty")
assert agent.name == "production-chat"
assert agent.model.id == "gpt-4o"
def test_messages_prepared():
messages = prepare("prompts/chat.prompty", inputs={"question": "Hello"})
assert messages[0].role == "system"
assert "Hello" in messages[-1].text

Integration Tests — Gated by Environment

Section titled “Integration Tests — Gated by Environment”

Only run when API keys are available:

import os, pytest
from prompty import invoke
skip_no_key = pytest.mark.skipif(
not os.environ.get("OPENAI_API_KEY"), reason="OPENAI_API_KEY not set"
)
@skip_no_key
def test_live_chat():
result = invoke("prompts/chat.prompty", inputs={"question": "Say hi"})
assert isinstance(result, str) and len(result) > 0

Treat .prompty files as first-class source artifacts:

  • Store in git alongside your application code
  • Use meaningful commit messages — “Tighten system prompt to reduce hallucination” is better than “update prompt”
  • Review prompt changes in PRs — template wording affects behavior as much as code
  • Tag releases that include prompt changes so you can roll back
Terminal window
git add prompts/chat.prompty
git commit -m "feat(prompts): add citation requirement to system prompt
Instructs the model to always cite sources when answering
factual questions. Reduces hallucination rate in eval suite."

Stream tokens to the client as they arrive instead of waiting for the full response:

from prompty import load, prepare, run, process
agent = load("chat.prompty")
agent.model.options.additionalProperties["stream"] = True
messages = prepare(agent, inputs={"question": "Write a summary"})
stream = run(agent, messages, raw=True)
for chunk in process(agent, stream):
print(chunk, end="", flush=True)
  • Cache prepared messages — if the same inputs are used repeatedly, cache the output of prepare() and call run() directly
  • Cache LLM responses — for deterministic prompts (temperature: 0), cache the full response keyed by the rendered messages
  • Use seed in model options for reproducible outputs when caching
Deterministic config for caching
model:
options:
temperature: 0
seed: 42