Skip to content

Responses API

The Responses API is OpenAI’s newer API surface — an alternative to Chat Completions. It uses a different wire format: system messages become an instructions parameter, user and assistant messages become an input array, tools use a flat structure, and structured output is configured via text.format instead of response_format.

To use it, set apiType: responses in your .prompty file. No code changes are needed — the runtime handles the wire format conversion automatically.

flowchart LR
    subgraph CC["Chat Completions"]
        direction LR
        A1["messages array\n(system + user\n+ assistant)"] --> B1["chat.completions\n.create()"]
        B1 --> C1["choices[0]\n.message.content"]
    end

    subgraph RA["Responses API"]
        direction LR
        A2["instructions\n+ input array"] --> B2["responses\n.create()"]
        B2 --> C2["output[]\nitems"]
    end

    style CC fill:#dbeafe,stroke:#3b82f6,color:#1e40af
    style RA fill:#d1fae5,stroke:#10b981,color:#065f46
    style A1 fill:#bfdbfe,stroke:#3b82f6,color:#1e3a8a
    style B1 fill:#bfdbfe,stroke:#3b82f6,color:#1e3a8a
    style C1 fill:#bfdbfe,stroke:#3b82f6,color:#1e3a8a
    style A2 fill:#a7f3d0,stroke:#10b981,color:#065f46
    style B2 fill:#a7f3d0,stroke:#10b981,color:#065f46
    style C2 fill:#a7f3d0,stroke:#10b981,color:#065f46

Set apiType: responses in the model section of your .prompty frontmatter:

my-prompt.prompty
---
name: responses-example
model:
id: gpt-4o
provider: openai
apiType: responses
connection:
kind: key
apiKey: ${env:OPENAI_API_KEY}
options:
temperature: 0.7
maxOutputTokens: 1000
inputSchema:
properties:
question:
kind: string
default: What is Prompty?
template:
format:
kind: jinja2
parser:
kind: prompty
---
system:
You are a helpful assistant.
user:
{{question}}

For Microsoft Foundry:

foundry-responses.prompty
---
name: foundry-responses
model:
id: gpt-4o
provider: foundry
apiType: responses
connection:
kind: key
endpoint: ${env:AZURE_AI_PROJECT_ENDPOINT}
apiKey: ${env:AZURE_AI_PROJECT_KEY}
---
system:
You are a helpful assistant.
user:
{{question}}

The Responses API is transparent to your calling code. The same execute() and execute_async() functions work — the runtime dispatches to responses.create() based on the apiType in the .prompty file.

from prompty import execute, execute_async
# Sync
result = execute("my-prompt.prompty", inputs={"question": "Hello!"})
print(result) # "Hi there! How can I help?"
# Async
result = await execute_async("my-prompt.prompty", inputs={"question": "Hello!"})
print(result)

Under the hood, the runtime converts your messages to a different wire format when apiType: responses is set. You don’t need to handle this yourself — but understanding the differences helps when debugging.

AspectChat CompletionsResponses API
API callclient.chat.completions.create()client.responses.create()
System messagesIn messages array with role: systemSeparate instructions parameter
User/assistant messagesIn messages arrayIn input array
Tool definitionsNested: {type: "function", function: {name, parameters}}Flat: {type: "function", name, parameters}
Structured outputresponse_format parametertext.format parameter
Max tokens optionmax_completion_tokensmax_output_tokens
AspectChat CompletionsResponses API
Response objectobject: "chat.completion"object: "response"
Content locationchoices[0].message.contentoutput[] items or output_text
Tool callschoices[0].message.tool_callsoutput[] items with type: "function_call"
Finish indicatorfinish_reason: "stop"No function_call items in output

Here’s what the runtime sends for a simple prompt with apiType: responses:

Responses API request
{
"model": "gpt-4o",
"instructions": "You are a helpful assistant.",
"input": [
{ "role": "user", "content": "What is Prompty?" }
],
"max_output_tokens": 1000,
"temperature": 0.7
}

Compare with the equivalent Chat Completions request:

Chat Completions request
{
"model": "gpt-4o",
"messages": [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "What is Prompty?" }
],
"max_completion_tokens": 1000,
"temperature": 0.7
}

Tool calling works with the Responses API through the same execute_agent() / executeAgent() functions used for Chat Completions. The agent loop automatically detects the Responses API response format and handles it correctly.

responses-agent.prompty
---
name: weather-agent
model:
id: gpt-4o
provider: openai
apiType: responses
connection:
kind: key
apiKey: ${env:OPENAI_API_KEY}
options:
temperature: 0
inputSchema:
properties:
question:
kind: string
default: What's the weather?
tools:
- name: get_weather
kind: function
description: Get the current weather for a city
parameters:
properties:
- name: city
kind: string
description: City name
required: true
strict: true
template:
format:
kind: jinja2
parser:
kind: prompty
---
system:
You are a helpful assistant with access to weather tools.
user:
{{question}}
import prompty
def get_weather(city: str) -> str:
return f"72°F and sunny in {city}"
agent = prompty.load("responses-agent.prompty")
result = prompty.execute_agent(
agent,
inputs={"question": "What's the weather in Seattle?"},
tools={"get_weather": get_weather},
max_iterations=10,
)
print(result) # "It's currently 72°F and sunny in Seattle!"

How Tool Calls Work with the Responses API

Section titled “How Tool Calls Work with the Responses API”

The Responses API uses a different format for tool calls than Chat Completions. The runtime handles this automatically, but here’s what happens under the hood:

flowchart TD
    A["Send instructions + input\nto responses.create()"] --> B["Receive response\nwith output[] items"]
    B --> C{"output contains\nfunction_call items?"}
    C -- Yes --> D["Execute tool functions"]
    D --> E["Build input with\nfunction_call +\nfunction_call_output items"]
    E -.-> A
    C -- No --> F["Extract output_text\nand return result"]

    style A fill:#3b82f6,stroke:#1d4ed8,color:#fff
    style B fill:#3b82f6,stroke:#1d4ed8,color:#fff
    style C fill:#f59e0b,stroke:#d97706,color:#fff
    style D fill:#10b981,stroke:#059669,color:#fff
    style E fill:#10b981,stroke:#059669,color:#fff
    style F fill:#1d4ed8,stroke:#1e3a8a,color:#fff

The wire format for tool interactions differs from Chat Completions:

StepChat CompletionsResponses API
Tool call from LLMmessage.tool_calls[] with id, function.name, function.argumentsoutput[] item with type: "function_call", call_id, name, arguments
Tool result to LLMMessage with role: "tool" and tool_call_idfunction_call_output input item with call_id and output
Context preservationAssistant message with tool_callsOriginal function_call items re-sent in input

When outputSchema is defined, the runtime converts it to the Responses API’s text.format parameter (instead of Chat Completions’ response_format). The processor automatically parses the JSON response.

structured-responses.prompty
---
name: weather-report
model:
id: gpt-4o
provider: openai
apiType: responses
connection:
kind: key
apiKey: ${env:OPENAI_API_KEY}
outputSchema:
properties:
- name: city
kind: string
description: The city name
- name: temperature
kind: integer
description: Temperature in Fahrenheit
- name: conditions
kind: string
description: Current weather conditions
template:
format:
kind: jinja2
parser:
kind: prompty
---
system:
Return the current weather for the requested city.
user:
Weather in {{city}}?
from prompty import execute
result = execute("structured-responses.prompty", inputs={"city": "Seattle"})
# result is already a parsed dict
print(result["city"]) # "Seattle"
print(result["temperature"]) # 62
print(result["conditions"]) # "Partly cloudy"
print(type(result)) # <class 'dict'>

The runtime sends structured output as text.format instead of response_format:

Responses API structured output
{
"model": "gpt-4o",
"instructions": "Return the current weather for the requested city.",
"input": [{ "role": "user", "content": "Weather in Seattle?" }],
"text": {
"format": {
"type": "json_schema",
"name": "weather_report",
"strict": true,
"schema": {
"type": "object",
"properties": {
"city": { "type": "string", "description": "The city name" },
"temperature": { "type": "integer", "description": "Temperature in Fahrenheit" },
"conditions": { "type": "string", "description": "Current weather conditions" }
},
"required": ["city", "temperature", "conditions"],
"additionalProperties": false
}
}
}
}

Not all providers support every API type. Here’s what’s available:

ProviderapiType: chatapiType: responsesapiType: embeddingapiType: image
OpenAI
Microsoft Foundry
Anthropic

When to Use Responses API vs Chat Completions

Section titled “When to Use Responses API vs Chat Completions”
Use CaseRecommendation
Maximum provider compatibilityapiType: chat (default)
OpenAI or Foundry only, want the latest API featuresapiType: responses
Anthropic modelsapiType: chat (only option)
Existing prompts that work fineKeep apiType: chat
Starting a new project on OpenAIEither works — responses is newer