Voice Agent API

Events reference

Every client-to-server and server-to-client event for the Voice Agent API.

Every message exchanged over the Voice Agent API WebSocket, grouped by direction. You’ll send session.update to configure, input.audio to stream mic audio, and tool.result to respond to tool calls — the server streams everything else back. For how these fit together in a typical session, see the Overview event flow.

Client → Server

input.audio

Stream PCM16 audio to the agent.

1{
2 "type": "input.audio",
3 "audio": "<base64-encoded PCM16>"
4}
FieldTypeDescription
audiostringBase64-encoded PCM16 mono 24kHz audio

See Audio format for the full format specification.


session.update

Configure the session. Send immediately on WebSocket connect — before session.ready. Can also be sent mid-conversation to update any field.

1{
2 "type": "session.update",
3 "session": {
4 "system_prompt": "You are a concise assistant.",
5 "greeting": "Hi — how can I help?",
6 "input": {
7 "format": { "encoding": "audio/pcm", "sample_rate": 24000 },
8 "turn_detection": { "type": "server_vad", "vad_threshold": 0.5 }
9 },
10 "output": {
11 "voice": "claire",
12 "format": { "encoding": "audio/pcm", "sample_rate": 24000 }
13 },
14 "tools": [
15 {
16 "type": "function",
17 "name": "get_weather",
18 "description": "Get weather for a city",
19 "parameters": {
20 "type": "object",
21 "properties": { "city": { "type": "string" } },
22 "required": ["city"]
23 }
24 }
25 ]
26 }
27}

All fields are optional — only include what you want to set or change.

FieldTypeDescription
session.system_promptstringSets the agent’s personality and context
session.greetingstringSpoken aloud at the start of the conversation
session.input.formatobjectInput audio format (encoding, sample_rate). See Audio format
session.input.turn_detectionobjectTurn detection configuration. See Session configuration
session.output.voicestringThe voice used for the agent’s speech. See Voices
session.output.formatobjectOutput audio format (encoding, sample_rate). See Audio format
session.toolsarrayTool definitions. See Tool calling

session.resume

Reconnect to an existing session using the session_id from a previous session.ready. Preserves conversation context across dropped connections.

1{
2 "type": "session.resume",
3 "session_id": "sess_abc123"
4}

Sessions are preserved for 30 seconds after every disconnection before expiring. If the session has expired, the server returns a session.error with code session_not_found or session_forbidden. Start a fresh connection without session.resume.

Example. Capture session_id from session.ready on the first connection, then send session.resume as the first message when reconnecting:

1import json
2import websockets
3
4session_id: str | None = None
5
6async def connect():
7 global session_id
8 async with websockets.connect(URL, additional_headers={"Authorization": API_KEY}) as ws:
9 # If we already have a session_id from a previous connection, resume it.
10 if session_id:
11 await ws.send(json.dumps({"type": "session.resume", "session_id": session_id}))
12 else:
13 await ws.send(json.dumps({"type": "session.update", "session": {...}}))
14
15 async for raw in ws:
16 event = json.loads(raw)
17 if event["type"] == "session.ready":
18 session_id = event["session_id"] # save for next reconnect
19 elif event["type"] == "session.error" and event["code"] in ("session_not_found", "session_forbidden"):
20 session_id = None # session expired — start fresh next time
21 # ... handle other events
22
23# On disconnect, call connect() again within 30 seconds to resume.

tool.result

Send a tool result back to the agent. Send this in the reply.done handler — not immediately in tool.call. See Tool calling.

1{
2 "type": "tool.result",
3 "call_id": "call_abc123",
4 "result": "{\"temp_c\": 22, \"description\": \"Sunny\"}"
5}
FieldTypeDescription
call_idstringThe call_id from the tool.call event
resultstringJSON string containing the tool result

Server → Client

session.ready

Session is established and ready to receive audio. Save session_id for reconnection. Start sending input.audio only after this event.

1{
2 "type": "session.ready",
3 "session_id": "sess_abc123"
4}

session.updated

Sent after session.update is applied successfully.

1{ "type": "session.updated" }

input.speech.started

Turn detection determined the user has started speaking.

1{ "type": "input.speech.started" }

input.speech.stopped

Turn detection determined the user has stopped speaking.

1{ "type": "input.speech.stopped" }

transcript.user.delta

Partial transcript of what the user is saying, updating in real-time.

1{
2 "type": "transcript.user.delta",
3 "text": "What's the weather in"
4}

transcript.user

Final transcript of the user’s utterance.

1{
2 "type": "transcript.user",
3 "text": "What's the weather in Tokyo?",
4 "item_id": "item_abc123"
5}

reply.started

Agent has begun generating a response.

1{
2 "type": "reply.started",
3 "reply_id": "reply_abc123"
4}

reply.audio

A chunk of the agent’s spoken response as base64 PCM16. Decode and play immediately.

1{
2 "type": "reply.audio",
3 "data": "<base64-encoded PCM16>"
4}

See Audio format for playback guidance.


transcript.agent

Full text of the agent’s response, sent after all audio for the response has been delivered. If the agent was interrupted, interrupted is true and text contains only what was actually spoken before the interruption.

1{
2 "type": "transcript.agent",
3 "text": "It's currently 22°C and sunny in Tokyo.",
4 "reply_id": "reply_abc123",
5 "item_id": "item_abc123",
6 "interrupted": false
7}
FieldTypeDescription
textstringWhat the agent said (trimmed to interruption point if interrupted)
reply_idstringID of the reply
item_idstringConversation item ID
interruptedbooleantrue if the user interrupted mid-response

reply.done

Agent has finished speaking. The optional status field indicates why the reply ended.

1{ "type": "reply.done" }
1{ "type": "reply.done", "status": "interrupted" }
FieldTypeDescription
statusstring"interrupted" if the user barged in, absent for normal completion

tool.call

Agent wants to call a registered tool. args is a dict — ready to use directly.

1{
2 "type": "tool.call",
3 "call_id": "call_abc123",
4 "name": "get_weather",
5 "args": { "location": "Tokyo" }
6}
FieldTypeDescription
call_idstringInclude this in tool.result
namestringTool name to call
argsobjectArguments as a dict — use directly

See Tool calling for the full pattern.


session.error

Session or protocol error.

1{
2 "type": "session.error",
3 "code": "invalid_format",
4 "message": "Invalid message format"
5}

Also handle "error" (without the session. prefix) for connection-level errors.

Error codes:

CodeDescription
invalid_formatMalformed event (e.g. input.audio sent before session.ready)
session_not_foundThe session_id in session.resume does not exist
session_forbiddenThe session_id belongs to a different API key

Interruptions

When the user speaks mid-response, the server stops the agent and emits:

  • reply.done with status: "interrupted"
  • transcript.agent with interrupted: true and text trimmed to what was actually spoken before the interruption

Discard any pending tool results — the agent is ready to listen again. To avoid playing stale audio after an interruption, flush your local output buffer. See Stopping playback on interruption.