Events reference
Every client-to-server and server-to-client event for the Voice Agent API.
Every message exchanged over the Voice Agent API WebSocket, grouped by direction. You’ll send session.update to configure, input.audio to stream mic audio, and tool.result to respond to tool calls — the server streams everything else back. For how these fit together in a typical session, see the Overview event flow.
Client → Server
input.audio
Stream PCM16 audio to the agent.
See Audio format for the full format specification.
session.update
Configure the session. Send immediately on WebSocket connect — before session.ready. Can also be sent mid-conversation to update any field.
All fields are optional — only include what you want to set or change.
session.resume
Reconnect to an existing session using the session_id from a previous session.ready. Preserves conversation context across dropped connections.
Sessions are preserved for 30 seconds after every disconnection before expiring. If the session has expired, the server returns a session.error with code session_not_found or session_forbidden. Start a fresh connection without session.resume.
Example. Capture session_id from session.ready on the first connection, then send session.resume as the first message when reconnecting:
tool.result
Send a tool result back to the agent. Send this in the reply.done handler — not immediately in tool.call. See Tool calling.
Server → Client
session.ready
Session is established and ready to receive audio. Save session_id for reconnection. Start sending input.audio only after this event.
session.updated
Sent after session.update is applied successfully.
input.speech.started
Turn detection determined the user has started speaking.
input.speech.stopped
Turn detection determined the user has stopped speaking.
transcript.user.delta
Partial transcript of what the user is saying, updating in real-time.
transcript.user
Final transcript of the user’s utterance.
reply.started
Agent has begun generating a response.
reply.audio
A chunk of the agent’s spoken response as base64 PCM16. Decode and play immediately.
See Audio format for playback guidance.
transcript.agent
Full text of the agent’s response, sent after all audio for the response has been delivered. If the agent was interrupted, interrupted is true and text contains only what was actually spoken before the interruption.
reply.done
Agent has finished speaking. The optional status field indicates why the reply ended.
tool.call
Agent wants to call a registered tool. args is a dict — ready to use directly.
See Tool calling for the full pattern.
session.error
Session or protocol error.
Also handle "error" (without the session. prefix) for connection-level errors.
Error codes:
Interruptions
When the user speaks mid-response, the server stops the agent and emits:
reply.donewithstatus: "interrupted"transcript.agentwithinterrupted: trueandtexttrimmed to what was actually spoken before the interruption
Discard any pending tool results — the agent is ready to listen again. To avoid playing stale audio after an interruption, flush your local output buffer. See Stopping playback on interruption.