Skip to main content
Once you’ve created an agent, you deploy it by opening a realtime connection and referencing its agent_id. The same agent runs unchanged across every channel. Pick the transport that fits your product.

Over the API

Server-side or native clients connect directly to the WebSocket with your API key.

From a browser

Mint a short-lived token server-side; the browser connects with no key exposed.

Over the phone

Bridge Twilio phone calls to the agent with zero transcoding.

Bind to your agent

Every deployment uses the same realtime endpoint:
wss://agents.assemblyai.com/v1/ws
Send your agent_id in the first session.update. The agent’s stored prompt, voice, and tools are loaded automatically, so you don’t resend them:
{
  "type": "session.update",
  "session": { "agent_id": "7ad24396-b822-4dca-871a-be9cc4781cf9" }
}
agent_id is mutually exclusive with inline session fields. When you bind to a stored agent, don’t also send system_prompt, greeting, tools, input, or output; those are rejected. To override config per session instead, send those fields inline and omit agent_id. See Inline configuration.
After session.ready, stream microphone audio and play back the agent’s audio. The event protocol is identical across all transports. See Send and play audio and the WebSocket events reference.

Over the API

For server-side apps, backends, and native mobile/desktop clients, connect directly with your API key in the Authorization header. The raw key works (a Bearer prefix is also accepted).
import asyncio, json, base64, websockets

URL = "wss://agents.assemblyai.com/v1/ws"
AGENT_ID = "7ad24396-b822-4dca-871a-be9cc4781cf9"

async def main():
    headers = {"Authorization": "YOUR_API_KEY"}
    async with websockets.connect(URL, extra_headers=headers) as ws:
        # Bind to the stored agent; no inline config needed.
        await ws.send(json.dumps({
            "type": "session.update",
            "session": {"agent_id": AGENT_ID},
        }))

        async for raw in ws:
            event = json.loads(raw)
            t = event.get("type")
            if t == "session.ready":
                print("ready:", event.get("session_id"))
                # start streaming input.audio frames here
            elif t == "transcript.agent":
                print("agent:", event.get("text"))
            elif t == "reply.audio":
                pcm = base64.b64decode(event["data"])  # play this
            elif t in ("error", "session.error"):
                print("error:", event.get("message"))

asyncio.run(main())
This passes your raw API key over the connection, which is fine for servers and trusted native clients. Never ship your API key in browser or mobile client code. For client-side apps, use the browser integration token flow.
Because the agent’s HTTP tools run server-side, this client never receives a tool.call or sends a tool.result; it just streams audio. (Client-side function tools still use that round trip; see Add tools.)

From a browser

Browsers give you acoustic echo cancellation for free, which makes them the easiest place to start. The flow:
  1. Your server calls GET /v1/token with your API key to mint a short-lived token.
  2. The browser opens wss://agents.assemblyai.com/v1/ws?token=<token> (no key exposed).
  3. The browser sends the same session.update with agent_id and streams audio.
See Connect from a browser for the token endpoint, a full HTML quickstart, and browser audio constraints.

Over the phone (Twilio)

Bridge inbound or outbound Twilio calls straight to your agent. Twilio streams G.711 μ-law audio, which the Voice Agent API accepts natively, with no transcoding. Point your Twilio media stream at the WebSocket, bind your agent_id, and the agent answers the call. See Connect to Twilio for the TwiML, inbound and outbound setup, and the audio bridge.

Next steps

Send and play audio

Encodings, sending input, and playing output with clean interruptions.

WebSocket events

Every event and payload, with the session flow diagram.

Best practices

Tune turn-taking, latency, and reliability once it works.

Troubleshooting

Symptom-to-fix table for the common failures.