Connect to the Voice Agent API to run a real-time voice conversation. The client streams PCM16 audio to the server and receives the agent’s spoken response (also PCM16), along with transcripts, tool calls, and lifecycle events. See the Voice Agent API overview for the full event flow and a runnable quickstart.Documentation Index
Fetch the complete documentation index at: https://assemblyai.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
WSSwss://agents.assemblyai.com/v1/ws
Authentication
Pass your API key as a Bearer token in theAuthorization header on the WebSocket upgrade
request. For browser apps (which can’t set custom headers on WebSockets), generate a
temporary token
and pass it via the token query parameter instead. See
Browser integration.
Pass your API key as a Bearer token in the
Authorization header on the WebSocket upgrade
request. For browser apps (which can’t set custom headers on WebSockets), generate a
temporary token
and pass it via the token query parameter instead. See
Browser integration.Query parameters
Temporary authentication token for client-side connections. Generate one with
GET /v1/token
on your server and pass it here so you don’t expose your permanent API key in
the browser. Each token is one-time use.Messages sent by the client
Update Session
Client message to configure the session (system prompt, greeting, input, output, tools).Allowed values:
session.update.Session configuration fields. All fields are optional — only include the ones you want to change.
Resume Session
Client message to resume a previous session bysession_id.
Allowed values:
session.resume.The
session_id from a previous session.ready event.Input Audio Chunk
Client streams a chunk of PCM16 audio as base64.Allowed values:
input.audio.Base64-encoded audio chunk in the configured input encoding.
Tool Result
Client returns the result of a tool invocation to the agent.Allowed values:
tool.result.The
call_id from the tool.call event you are responding to.JSON-encoded string containing the tool result. Always a string,
not a nested object. Use
json.dumps(...) (Python) or JSON.stringify(...) (JS)
on the payload before sending.Reply Create
Client asks the agent to generate a reply now, optionally with one-shot instructions.Allowed values:
reply.create.Optional one-shot instructions the agent uses to compose this reply.
Does not modify
system_prompt. Useful for status updates during a
hold-mode tool call. See
Tool calling — execution modes.Messages received from the server
Session Ready
Server confirms the session is established and ready for audio.Allowed values:
session.ready.Unique identifier for this session. Save this to reconnect with
session.resume.Session Updated
Server acknowledges that asession.update was applied successfully.
Allowed values:
session.updated.Session Error
Server reports a session- or protocol-level error.session.error for session/protocol errors and error for connection-level errors. Allowed values: session.error, error.Machine-readable error code. See error codes
for the full table grouped by lifecycle stage. Allowed values:
UNAUTHORIZED, FORBIDDEN, INTERNAL_ERROR, server_error, session_not_found, session_forbidden, session_expired, agent_init_failed, agent_timeout, invalid_format, invalid_audio, invalid_value, immutable_field, invalid_config.Human-readable error description.
ISO-8601 timestamp of the error.
Name of the offending field when applicable (e.g. on
session.update validation failures).User Started Speaking
Server signals that turn detection determined the user started speaking.Allowed values:
input.speech.started.User Stopped Speaking
Server signals that turn detection determined the user stopped speaking.Allowed values:
input.speech.stopped.User Transcript Delta
Partial transcript of the user’s current utterance.Allowed values:
transcript.user.delta.Partial transcript of what the user is saying.
User Transcript
Final transcript of the user’s utterance.Allowed values:
transcript.user.Final transcript of the user’s utterance.
Conversation item ID.
Reply Started
Agent has begun generating a reply.Allowed values:
reply.started.ID of this reply.
Reply Audio Chunk
A chunk of the agent’s spoken response as base64 PCM16.Allowed values:
reply.audio.Base64-encoded audio chunk in the configured output encoding.
Agent Transcript
Text of the agent’s response, sent after all reply audio has been delivered.Allowed values:
transcript.agent.What the agent said. If interrupted, trimmed to the point of interruption.
ID of the reply this transcript belongs to.
Conversation item ID.
Whether the user interrupted the agent mid-response.
Reply Done
Agent has finished speaking. If the user barged in,status is "interrupted". Send
accumulated tool.result events on this event.
Allowed values:
reply.done."completed" for normal completion, "interrupted" if the user barged in. Allowed values: completed, interrupted.Tool Call
Agent wants to invoke a registered tool.Allowed values:
tool.call.Include this value in the corresponding
tool.result.Name of the tool the agent is invoking.
Arguments to pass to the tool, as a dictionary.