agent_id. The same agent runs unchanged across every channel. Pick the transport that fits your product.
Over the API
Server-side or native clients connect directly to the WebSocket with your API key.
From a browser
Mint a short-lived token server-side; the browser connects with no key exposed.
Over the phone
Bridge Twilio phone calls to the agent with zero transcoding.
Bind to your agent
Every deployment uses the same realtime endpoint:agent_id in the first session.update. The agent’s stored prompt, voice, and tools are loaded automatically, so you don’t resend them:
agent_id is mutually exclusive with inline session fields. When you bind to a stored agent, don’t also send system_prompt, greeting, tools, input, or output; those are rejected. To override config per session instead, send those fields inline and omit agent_id. See Inline configuration.session.ready, stream microphone audio and play back the agent’s audio. The event protocol is identical across all transports. See Send and play audio and the WebSocket events reference.
Over the API
For server-side apps, backends, and native mobile/desktop clients, connect directly with your API key in theAuthorization header. The raw key works (a Bearer prefix is also accepted).
tool.call or sends a tool.result; it just streams audio. (Client-side function tools still use that round trip; see Add tools.)
From a browser
Browsers give you acoustic echo cancellation for free, which makes them the easiest place to start. The flow:- Your server calls
GET /v1/tokenwith your API key to mint a short-lived token. - The browser opens
wss://agents.assemblyai.com/v1/ws?token=<token>(no key exposed). - The browser sends the same
session.updatewithagent_idand streams audio.
Over the phone (Twilio)
Bridge inbound or outbound Twilio calls straight to your agent. Twilio streams G.711 μ-law audio, which the Voice Agent API accepts natively, with no transcoding. Point your Twilio media stream at the WebSocket, bind youragent_id, and the agent answers the call.
See Connect to Twilio for the TwiML, inbound and outbound setup, and the audio bridge.
Next steps
Send and play audio
Encodings, sending input, and playing output with clean interruptions.
WebSocket events
Every event and payload, with the session flow diagram.
Best practices
Tune turn-taking, latency, and reliability once it works.
Troubleshooting
Symptom-to-fix table for the common failures.