Events reference
Every client-to-server and server-to-client event for the Voice Agent API.
Every message exchanged over the Voice Agent API WebSocket, grouped by direction. You’ll send session.update to configure, input.audio to stream mic audio, and tool.result to respond to tool calls. The server streams everything else back. For how these fit together in a typical session, see the Overview event flow.
Client → Server
input.audio
Stream PCM16 audio to the agent.
See Audio format for the full format specification.
session.update
Configure the session. Send immediately on WebSocket connect (before session.ready). Can also be sent mid-conversation to update most fields. See Mutability after session.ready for which fields can change once the session is established.
All fields are optional. Include only what you want to set or change. After session.ready, only a subset of fields can be changed; changing greeting or session.output raises immutable_field.
session.resume
Reconnect to an existing session using the session_id from a previous session.ready. Preserves conversation context across dropped connections.
Sessions are preserved for 30 seconds after every disconnection before expiring. If the session has expired, the server returns a session.error with code session_not_found or session_forbidden. Start a fresh connection without session.resume.
Example. Capture session_id from session.ready on the first connection, then send session.resume as the first message when reconnecting:
tool.result
Send a tool result back to the agent. Send this in the reply.done handler (not immediately in tool.call). See Tool calling.
Server → Client
session.ready
Session is established and ready to receive audio. Save session_id for reconnection. Start sending input.audio only after this event.
session.updated
Sent after session.update is applied successfully.
input.speech.started
Turn detection determined the user has started speaking.
input.speech.stopped
Turn detection determined the user has stopped speaking.
transcript.user.delta
Partial transcript of what the user is saying, updating in real-time.
transcript.user
Final transcript of the user’s utterance.
reply.started
Agent has begun generating a response.
reply.audio
A chunk of the agent’s spoken response as base64 PCM16. Decode and play immediately.
See Audio format for playback guidance.
transcript.agent
Full text of the agent’s response, sent after all audio for the response has been delivered. If the agent was interrupted, interrupted is true and text contains only what was actually spoken before the interruption.
reply.done
Agent has finished speaking. The optional status field indicates why the reply ended.
tool.call
Agent wants to call a registered tool. arguments is a dict, ready to use directly as-is.
See Tool calling for the full pattern.
session.error
Session or protocol error. The payload always includes type, timestamp, code, and message. Some errors (like session.update validation failures) also include a param field naming the offending field.
Connection and handshake errors
Sent before or instead of session.ready. The WebSocket closes after these with the indicated close code.
Session resume errors
Sent when session.resume fails. The WebSocket closes after these.
Agent startup errors
Sent after the WebSocket is accepted but before session.ready.
Client message errors
Sent on the open socket when an inbound message is invalid. The session stays alive (except session_expired).
Live session errors
If the server cancels the session due to an internal error, the WebSocket closes with code 1011 without any session.error payload. In browsers, pre-handshake failures (like UNAUTHORIZED) surface as a close event with code 1006. You won’t receive a session.error. Always fetch a fresh token immediately before each connection attempt.
Interruptions
When the user speaks mid-response (barge-in), the server stops the agent and emits reply.done with status: "interrupted" and transcript.agent with interrupted: true. The decision is semantic. Back-channels like “uh-huh” don’t trigger an interruption. See Turn detection and interruptions for how the model decides, and Handling interruptions for the client-side flush pattern.