Skip to main content

Documentation Index

Fetch the complete documentation index at: https://assemblyai.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

Connect to the Voice Agent API to run a real-time voice conversation. The client streams PCM16 audio to the server and receives the agent’s spoken response (also PCM16), along with transcripts, tool calls, and lifecycle events. See the Voice Agent API overview for the full event flow and a runnable quickstart.
WSSwss://agents.assemblyai.com/v1/ws

Authentication

Pass your API key as a Bearer token in the Authorization header on the WebSocket upgrade request. For browser apps (which can’t set custom headers on WebSockets), generate a temporary token and pass it via the token query parameter instead. See Browser integration.
Authorization
string
Pass your API key as a Bearer token in the Authorization header on the WebSocket upgrade request. For browser apps (which can’t set custom headers on WebSockets), generate a temporary token and pass it via the token query parameter instead. See Browser integration.

Query parameters

token
string
Temporary authentication token for client-side connections. Generate one with GET /v1/token on your server and pass it here so you don’t expose your permanent API key in the browser. Each token is one-time use.

Messages sent by the client

Update Session

Client message to configure the session (system prompt, greeting, input, output, tools).
type
string
required
Allowed values: session.update.
session
object
required
Session configuration fields. All fields are optional — only include the ones you want to change.
{
  "type": "session.update",
  "session": {
    "system_prompt": "You are a concise assistant.",
    "greeting": "Hi — how can I help?",
    "input": {
      "format": {
        "encoding": "audio/pcm"
      },
      "turn_detection": {
        "vad_threshold": 0.5
      }
    },
    "output": {
      "voice": "ivy",
      "format": {
        "encoding": "audio/pcm"
      },
      "volume": 100
    },
    "tools": [
      {
        "type": "function",
        "name": "get_weather",
        "description": "Get weather for a city",
        "parameters": {
          "type": "object",
          "properties": {
            "city": {
              "type": "string"
            }
          },
          "required": ["city"]
        }
      }
    ]
  }
}

Resume Session

Client message to resume a previous session by session_id.
type
string
required
Allowed values: session.resume.
session_id
string
required
The session_id from a previous session.ready event.
{
  "type": "session.resume",
  "session_id": "sess_abc123"
}

Input Audio Chunk

Client streams a chunk of PCM16 audio as base64.
type
string
required
Allowed values: input.audio.
audio
string
required
Base64-encoded audio chunk in the configured input encoding.
{
  "type": "input.audio",
  "audio": "EAAgADAAQAAwACAAEAAAAPD/4P/Q/8D/"
}

Tool Result

Client returns the result of a tool invocation to the agent.
type
string
required
Allowed values: tool.result.
call_id
string
required
The call_id from the tool.call event you are responding to.
result
string
required
JSON-encoded string containing the tool result. Always a string, not a nested object. Use json.dumps(...) (Python) or JSON.stringify(...) (JS) on the payload before sending.
{
  "type": "tool.result",
  "call_id": "call_abc123",
  "result": "{\"temp_c\": 22, \"description\": \"Sunny\"}"
}

Reply Create

Client asks the agent to generate a reply now, optionally with one-shot instructions.
type
string
required
Allowed values: reply.create.
instructions
string
Optional one-shot instructions the agent uses to compose this reply. Does not modify system_prompt. Useful for status updates during a hold-mode tool call. See Tool calling — execution modes.
{
  "type": "reply.create",
  "instructions": "Let the customer know we're still processing the transfer."
}

Messages received from the server

Session Ready

Server confirms the session is established and ready for audio.
type
string
required
Allowed values: session.ready.
session_id
string
required
Unique identifier for this session. Save this to reconnect with session.resume.
{
  "type": "session.ready",
  "session_id": "sess_abc123"
}

Session Updated

Server acknowledges that a session.update was applied successfully.
type
string
required
Allowed values: session.updated.
{
  "type": "session.updated"
}

Session Error

Server reports a session- or protocol-level error.
type
string
required
session.error for session/protocol errors and error for connection-level errors. Allowed values: session.error, error.
code
string
required
Machine-readable error code. See error codes for the full table grouped by lifecycle stage. Allowed values: UNAUTHORIZED, FORBIDDEN, INTERNAL_ERROR, server_error, session_not_found, session_forbidden, session_expired, agent_init_failed, agent_timeout, invalid_format, invalid_audio, invalid_value, immutable_field, invalid_config.
message
string
required
Human-readable error description.
timestamp
string
ISO-8601 timestamp of the error.
param
string
Name of the offending field when applicable (e.g. on session.update validation failures).
{
  "type": "session.error",
  "code": "invalid_format",
  "message": "Invalid message format"
}

User Started Speaking

Server signals that turn detection determined the user started speaking.
type
string
required
Allowed values: input.speech.started.
{
  "type": "input.speech.started"
}

User Stopped Speaking

Server signals that turn detection determined the user stopped speaking.
type
string
required
Allowed values: input.speech.stopped.
{
  "type": "input.speech.stopped"
}

User Transcript Delta

Partial transcript of the user’s current utterance.
type
string
required
Allowed values: transcript.user.delta.
text
string
required
Partial transcript of what the user is saying.
{
  "type": "transcript.user.delta",
  "text": "What's the weather in"
}

User Transcript

Final transcript of the user’s utterance.
type
string
required
Allowed values: transcript.user.
text
string
required
Final transcript of the user’s utterance.
item_id
string
required
Conversation item ID.
{
  "type": "transcript.user",
  "text": "What's the weather in Tokyo?",
  "item_id": "item_abc123"
}

Reply Started

Agent has begun generating a reply.
type
string
required
Allowed values: reply.started.
reply_id
string
required
ID of this reply.
{
  "type": "reply.started",
  "reply_id": "reply_abc123"
}

Reply Audio Chunk

A chunk of the agent’s spoken response as base64 PCM16.
type
string
required
Allowed values: reply.audio.
data
string
required
Base64-encoded audio chunk in the configured output encoding.
{
  "type": "reply.audio",
  "data": "EAAgADAAQAAwACAAEAAAAPD/4P/Q/8D/"
}

Agent Transcript

Text of the agent’s response, sent after all reply audio has been delivered.
type
string
required
Allowed values: transcript.agent.
text
string
required
What the agent said. If interrupted, trimmed to the point of interruption.
reply_id
string
required
ID of the reply this transcript belongs to.
item_id
string
required
Conversation item ID.
interrupted
boolean
required
Whether the user interrupted the agent mid-response.
{
  "type": "transcript.agent",
  "text": "It's currently 22°C and sunny in Tokyo.",
  "reply_id": "reply_abc123",
  "item_id": "item_abc123",
  "interrupted": false
}

Reply Done

Agent has finished speaking. If the user barged in, status is "interrupted". Send accumulated tool.result events on this event.
type
string
required
Allowed values: reply.done.
status
string
required
"completed" for normal completion, "interrupted" if the user barged in. Allowed values: completed, interrupted.
{
  "type": "reply.done"
}

Tool Call

Agent wants to invoke a registered tool.
type
string
required
Allowed values: tool.call.
call_id
string
required
Include this value in the corresponding tool.result.
name
string
required
Name of the tool the agent is invoking.
arguments
object
required
Arguments to pass to the tool, as a dictionary.
{
  "type": "tool.call",
  "call_id": "call_abc123",
  "name": "get_weather",
  "arguments": {
    "location": "Tokyo"
  }
}