Tutorial: Prompts in Production
What you’ll learn
Section titled “What you’ll learn”How to take a .prompty file from local development to production-ready:
environment configuration, connection management, tracing, error handling,
testing, versioning, and performance optimization.
1. Environment Configuration
Section titled “1. Environment Configuration”Never hardcode secrets. Use ${env:VAR} references in your .prompty file and
resolve them from the environment at load time.
---name: production-chatmodel: id: gpt-4o provider: openai connection: kind: key endpoint: ${env:OPENAI_ENDPOINT:https://api.openai.com/v1} apiKey: ${env:OPENAI_API_KEY} options: temperature: 0.7 maxOutputTokens: 1024---For local development, create a .env file (add it to .gitignore):
OPENAI_API_KEY=sk-your-key-hereOPENAI_ENDPOINT=https://api.openai.com/v1For CI/CD, set environment variables in your pipeline configuration:
env: OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} OPENAI_ENDPOINT: ${{ secrets.OPENAI_ENDPOINT }}2. Connection Management
Section titled “2. Connection Management”Use different connection strategies for development vs production. In dev, use
kind: key with environment variables. In production, use kind: reference
with a pre-configured SDK client that supports managed identity.
Development — API Key
Section titled “Development — API Key”model: connection: kind: key endpoint: ${env:OPENAI_ENDPOINT:https://api.openai.com/v1} apiKey: ${env:OPENAI_API_KEY}Production — Reference Connection
Section titled “Production — Reference Connection”model: connection: kind: reference name: prod-clientRegister the client at application startup:
import osimport promptyfrom openai import AzureOpenAIfrom azure.identity import DefaultAzureCredential, get_bearer_token_provider
client = AzureOpenAI( azure_endpoint=os.environ["AZURE_ENDPOINT"], azure_ad_token_provider=get_bearer_token_provider( DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default", ),)prompty.register_connection("prod-client", client=client)import { AzureOpenAI } from "openai";import { DefaultAzureCredential, getBearerTokenProvider } from "@azure/identity";import { registerConnection } from "@prompty/core";
const credential = new DefaultAzureCredential();const client = new AzureOpenAI({ endpoint: process.env.AZURE_ENDPOINT!, azureADTokenProvider: getBearerTokenProvider( credential, "https://cognitiveservices.azure.com/.default" ),});registerConnection("prod-client", client);using Azure.Identity;using Prompty.Core;
var client = new AzureOpenAI( new Uri(Environment.GetEnvironmentVariable("AZURE_ENDPOINT")!), new DefaultAzureCredential());ConnectionRegistry.Register("prod-client", client);3. Enable Tracing
Section titled “3. Enable Tracing”Development — Console Tracer
Section titled “Development — Console Tracer”from prompty import Tracerfrom prompty.tracing.tracer import console_tracer
Tracer.add("console", console_tracer)import { Tracer, consoleTracer } from "@prompty/core";
Tracer.add("console", consoleTracer);using Prompty.Core.Tracing;
Tracer.Add("console", ConsoleTracer.Factory);Production — OpenTelemetry
Section titled “Production — OpenTelemetry”from prompty import Tracerfrom prompty.tracing.otel import otel_tracer
# Sends spans to your OTel collector (Jaeger, Azure Monitor, etc.)Tracer.add("otel", otel_tracer())Install the dependency:
pip install prompty[otel]import { Tracer } from "@prompty/core";import { otelTracer } from "@prompty/core/tracing/otel";
Tracer.add("otel", otelTracer());Install the dependency:
npm install @opentelemetry/apiusing Prompty.Core.Tracing;
OTelTracer.Register();4. Error Handling
Section titled “4. Error Handling”Wrap invoke() calls to handle common production failures: rate limits,
timeouts, and authentication errors.
import promptyfrom openai import RateLimitError, APITimeoutError, AuthenticationError
def safe_invoke(path: str, inputs: dict) -> str | None: try: return prompty.invoke(path, inputs=inputs) except AuthenticationError: print("Check your API key or connection config") return None except RateLimitError: print("Rate limited — retry with backoff") return None except APITimeoutError: print("Request timed out — retry or increase timeout") return None except FileNotFoundError: print(f"Prompt file not found: {path}") return None except ValueError as e: print(f"Invalid prompt configuration: {e}") return Noneimport { invoke } from "@prompty/core";
async function safeInvoke(path: string, inputs: Record<string, unknown>) { try { return await invoke(path, inputs); } catch (err: any) { if (err?.status === 429) { console.error("Rate limited — retry with backoff"); } else if (err?.status === 401) { console.error("Check your API key or connection config"); } else if (err?.code === "ETIMEDOUT") { console.error("Request timed out — retry or increase timeout"); } else { console.error("Prompt error:", err.message); } return null; }}using Prompty.Core;
async Task<string?> SafeInvokeAsync(string path, Dictionary<string, object?> inputs){ try { var result = await Pipeline.InvokeAsync(path, inputs); return result?.ToString(); } catch (HttpRequestException ex) when (ex.StatusCode == System.Net.HttpStatusCode.TooManyRequests) { Console.Error.WriteLine("Rate limited — retry with backoff"); return null; } catch (HttpRequestException ex) when (ex.StatusCode == System.Net.HttpStatusCode.Unauthorized) { Console.Error.WriteLine("Check your API key or connection config"); return null; } catch (TaskCanceledException) { Console.Error.WriteLine("Request timed out"); return null; }}5. Testing Strategy
Section titled “5. Testing Strategy”Unit Tests — Mock the LLM
Section titled “Unit Tests — Mock the LLM”Test loading, rendering, and parsing without making API calls:
from prompty import load, prepare
def test_prompt_loads(): agent = load("prompts/chat.prompty") assert agent.name == "production-chat" assert agent.model.id == "gpt-4o"
def test_messages_prepared(): messages = prepare("prompts/chat.prompty", inputs={"question": "Hello"}) assert messages[0].role == "system" assert "Hello" in messages[-1].textimport { load, prepare } from "@prompty/core";
test("prompt loads correctly", async () => { const agent = await load("prompts/chat.prompty"); expect(agent.name).toBe("production-chat"); expect(agent.model.id).toBe("gpt-4o");});
test("messages are prepared", async () => { const messages = await prepare("prompts/chat.prompty", { question: "Hello" }); expect(messages[0].role).toBe("system");});using Prompty.Core;using Xunit;
public class PromptTests{ [Fact] public async Task PromptLoadsCorrectly() { var agent = await PromptyLoader.LoadAsync("Prompts/chat.prompty"); Assert.Equal("production-chat", agent.Name); }
[Fact] public async Task MessagesPrepared() { var agent = await PromptyLoader.LoadAsync("Prompts/chat.prompty"); var msgs = await Pipeline.PrepareAsync(agent, new() { ["question"] = "Hello" }); Assert.Equal("system", msgs[0].Role); }}Integration Tests — Gated by Environment
Section titled “Integration Tests — Gated by Environment”Only run when API keys are available:
import os, pytestfrom prompty import invoke
skip_no_key = pytest.mark.skipif( not os.environ.get("OPENAI_API_KEY"), reason="OPENAI_API_KEY not set")
@skip_no_keydef test_live_chat(): result = invoke("prompts/chat.prompty", inputs={"question": "Say hi"}) assert isinstance(result, str) and len(result) > 0import { invoke } from "@prompty/core";
const hasKey = !!process.env.OPENAI_API_KEY;
describe.skipIf(!hasKey)("Integration", () => { it("returns a live response", async () => { const result = await invoke("prompts/chat.prompty", { question: "Say hi" }); expect(typeof result).toBe("string"); });});using Xunit;
public class IntegrationTests{ [SkippableFact] public async Task LiveChatCompletion() { Skip.IfNot(!string.IsNullOrEmpty( Environment.GetEnvironmentVariable("OPENAI_API_KEY")), "No API key"); var result = await Pipeline.InvokeAsync("Prompts/chat.prompty", new() { ["question"] = "Say hi" }); Assert.NotNull(result); }}6. Versioning Prompts
Section titled “6. Versioning Prompts”Treat .prompty files as first-class source artifacts:
- Store in git alongside your application code
- Use meaningful commit messages — “Tighten system prompt to reduce hallucination” is better than “update prompt”
- Review prompt changes in PRs — template wording affects behavior as much as code
- Tag releases that include prompt changes so you can roll back
git add prompts/chat.promptygit commit -m "feat(prompts): add citation requirement to system prompt
Instructs the model to always cite sources when answeringfactual questions. Reduces hallucination rate in eval suite."7. Performance
Section titled “7. Performance”Streaming for Long Responses
Section titled “Streaming for Long Responses”Stream tokens to the client as they arrive instead of waiting for the full response:
from prompty import load, prepare, run, process
agent = load("chat.prompty")agent.model.options.additionalProperties["stream"] = True
messages = prepare(agent, inputs={"question": "Write a summary"})stream = run(agent, messages, raw=True)for chunk in process(agent, stream): print(chunk, end="", flush=True)import { load, prepare, run, process as processResponse } from "@prompty/core";
const agent = await load("chat.prompty");agent.model.options.additionalProperties.stream = true;
const messages = await prepare(agent, { question: "Write a summary" });const stream = await run(agent, messages, { raw: true });for await (const chunk of processResponse(agent, stream)) { process.stdout.write(chunk);}using Prompty.Core;
var agent = PromptyLoader.Load("chat.prompty");var messages = await Pipeline.PrepareAsync(agent, new() { ["question"] = "Write a summary" });var raw = await Pipeline.RunAsync(agent, messages, raw: true);if (raw is PromptyStream stream){ await foreach (var chunk in stream) Console.Write(chunk);}Caching Strategies
Section titled “Caching Strategies”- Cache prepared messages — if the same inputs are used repeatedly, cache the
output of
prepare()and callrun()directly - Cache LLM responses — for deterministic prompts (
temperature: 0), cache the full response keyed by the rendered messages - Use
seedin model options for reproducible outputs when caching
model: options: temperature: 0 seed: 42