Messages

Voice Agent API

Voice Agent WebSocket

Connect to the Voice Agent API to run a real-time voice conversation. The client streams PCM16 audio to the server and receives the agent’s spoken response (also PCM16), along with transcripts, tool calls, and lifecycle events.

After the WebSocket opens, send a session.update as your first message. You have two ways to configure the agent:

Stored agent. Send { "agent_id": "<id>" } as the only field in session to bind to a reusable agent created via the Agents REST API. The stored system_prompt, greeting, tools, input, and output are applied server-side.
Inline configuration. Omit agent_id and send system_prompt, greeting, tools, input, and output directly. Useful for one-off or fully dynamic agents.

The two modes are mutually exclusive. See Deploy your agent and Inline session configuration for details, or jump to the Voice Agent API overview for the full event flow and a runnable quickstart.

WSS

Messages

ApiKey

type:string

required

Pass your API key as a Bearer token in the Authorization header on the WebSocket upgrade request. For browser apps (which can't set custom headers on WebSockets), generate a temporary token and pass it via the token query parameter instead. See Browser integration.

token

type:string

required

Temporary authentication token for client-side connections. Generate one with GET /v1/token on your server and pass it here so you don't expose your permanent API key in the browser. Each token is one-time use.

Session Ready

type:object

Server confirms the session is established and ready for audio.

Session Updated

type:object

Server acknowledges that a session.update was applied successfully.

Session Ended

type:object

Final event emitted on every clean teardown, right before the WebSocket closes. Sent when the client sends session.end, the session hits max_session_duration_seconds, the server hits an unrecoverable error, or the 30-second grace window after a disconnect expires.

Session Error

type:object

Server reports a session- or protocol-level error.

User Started Speaking

type:object

Server signals that turn detection determined the user started speaking.

User Stopped Speaking

type:object

Server signals that turn detection determined the user stopped speaking.

User Transcript Delta

type:object

Partial transcript of the user's current utterance.

User Transcript

type:object

Final transcript of the user's utterance.

Reply Started

type:object

Agent has begun generating a reply.

Reply Audio Chunk

type:object

A chunk of the agent's spoken response as base64 PCM16.

Agent Transcript

type:object

Text of the agent's response, sent after all reply audio has been delivered.

Reply Done

type:object

Agent has finished speaking. If the user barged in, status is "interrupted". Send accumulated tool.result events on this event.

Tool Call

type:object

Agent wants to invoke a registered tool.

Update Session

type:object

Client message to configure the session. Either bind to a stored agent by sending agent_id, or configure an inline agent with system_prompt, greeting, input, output, and tools. The two modes are mutually exclusive.

Resume Session

type:object

Client message to resume a previous session by session_id.

End Session

type:object

Client message to cleanly end the session. The server emits a final session.ended and closes the WebSocket; the session_id is dead immediately and cannot be resumed. Use this instead of just closing the socket to stop billing right away — closing the socket without session.end leaves the session resumable (and billable) for 30 seconds.

Input Audio Chunk

type:object

Client streams a chunk of PCM16 audio as base64.

Tool Result

type:object

Client returns the result of a tool invocation to the agent.

Reply Create

type:object

Client asks the agent to generate a reply now, optionally with one-shot instructions.

Generate voice agent token

Create an agent

⌘I