Skip to content

Tutorial: Build a Chat Assistant

A chat assistant that:

  1. Answers questions using OpenAI (gpt-4o-mini)
  2. Maintains conversation history across multiple turns
  3. Has a configurable system prompt you can tweak without changing code

By the end (~15 min) you’ll understand the .prompty file format, the load → prepare → run pipeline, and how thread inputs work.


Terminal window
pip install prompty[jinja2,openai]

Create a .env file in your project root with your OpenAI key:

Terminal window
OPENAI_API_KEY=sk-your-key-here

Create a file called assistant.prompty:

---
name: chat-assistant
description: A friendly chat assistant
model:
id: gpt-4o-mini
provider: openai
apiType: chat
connection:
kind: key
apiKey: ${env:OPENAI_API_KEY}
options:
temperature: 0.7
maxOutputTokens: 1024
inputs:
- name: question
kind: string
default: What can you help me with?
---
system:
You are a friendly, helpful assistant. Keep answers concise — two or three
sentences at most — unless the user asks for more detail.
user:
{{question}}

Let’s break down each section:

SectionWhat it does
name / descriptionIdentity — shows up in traces and tooling
modelWhich LLM to call, how to authenticate, and generation options
model.connection${env:OPENAI_API_KEY} is resolved at load time from your .env
inputsDeclares the variables your template expects (with defaults)
templateUse Jinja2 for rendering and the built-in Prompty parser for role markers
Body (below ---)The actual prompt — system: and user: are role markers

The quickest way — one function call that handles everything:

import prompty
result = prompty.invoke(
"assistant.prompty",
inputs={"question": "What is Prompty?"},
)
print(result)
# → "Prompty is a markdown file format for LLM prompts..."

invoke() handles the full pipeline: load the file → render the template → parse role markers → call the LLMprocess the response.


For more control, break the pipeline into individual steps:

import prompty
# 1. Load — parse the .prompty file into a typed Prompty
agent = prompty.load("assistant.prompty")
# 2. Prepare — render the template + parse role markers → messages
messages = prompty.prepare(agent, inputs={"question": "Explain async/await"})
print(messages)
# [
# Message(role="system", content="You are a friendly, helpful assistant..."),
# Message(role="user", content="Explain async/await"),
# ]
# 3. Run — call the LLM + process the response → clean string
result = prompty.run(agent, messages)
print(result)

This is useful when you need to inspect or modify the messages before sending them to the LLM — for example, injecting extra context from a database.


Right now each call is stateless. To build a real chat assistant you need multi-turn conversation. Prompty handles this with kind: thread inputs.

Update assistant.prompty to add a conversation input:

---
name: chat-assistant
description: A friendly chat assistant with conversation history
model:
id: gpt-4o-mini
provider: openai
apiType: chat
connection:
kind: key
apiKey: ${env:OPENAI_API_KEY}
options:
temperature: 0.7
maxOutputTokens: 1024
inputs:
- name: question
kind: string
default: What can you help me with?
- name: conversation
kind: thread
---
system:
You are a friendly, helpful assistant. Keep answers concise — two or three
sentences at most — unless the user asks for more detail.
{{conversation}}
user:
{{question}}

The key changes: a new conversation input with kind: thread, and {{conversation}} placed in the body where previous messages should appear.

Now accumulate messages across turns:

import prompty
history = []
while True:
question = input("You: ")
if question.lower() in ("quit", "exit"):
break
result = prompty.invoke(
"assistant.prompty",
inputs={"question": question, "conversation": history},
)
print(f"Assistant: {result}\n")
# Append this turn to history for the next call
history.append({"role": "user", "content": question})
history.append({"role": "assistant", "content": result})

Each call now includes the full conversation history. The pipeline injects the conversation thread messages between the system prompt and the new user message, so the LLM sees the entire context.


Want to see what Prompty sends to the LLM? Register the console tracer at the top of your script:

from prompty import Tracer
from prompty.tracing.tracer import console_tracer
Tracer.add("console", console_tracer)
# Now every invoke() call prints trace details to stdout

The console tracer logs each pipeline stage — you’ll see the rendered prompt, the parsed messages, the raw LLM response, and the processed result. It’s invaluable for debugging unexpected outputs.


✅ The .prompty file format — YAML frontmatter + markdown body
✅ The invoke() one-liner and the load → prepare → run pipeline
Thread inputs (kind: thread) for multi-turn conversation
Console tracing for debugging