Deploy your agent

Once you’ve created an agent, you connect to it by opening a realtime WebSocket and referencing its agent_id. The same agent runs unchanged whether you connect from your own server, a browser, or a phone call.

The connection lifecycle

Every transport talks to the same realtime endpoint and follows the same five steps:

wss://agents.assemblyai.com/v1/ws

Open the WebSocket (with your API key, or a browser token).
Bind to your agent by sending one session.update with its agent_id. The stored prompt, voice, and tools load automatically, so you don’t resend them.
Wait for session.ready — the signal the agent is live.
Stream microphone audio as input.audio frames and play the agent’s reply.audio frames.
End cleanly with session.end when the conversation is over.

Always send session.end before closing the WebSocket. If the client just closes the socket, the server holds the session open for a 30-second session.resume grace window, and that window is billable. session.end short-circuits the grace window, emits session.ended, and stops billing immediately.

ws.send(JSON.stringify({ type: "session.end" }));
// Wait for session.ended, then close the socket.

Send session.end on every intentional disconnect: “End call” button, page unload (beforeunload / pagehide), user hangup, and Ctrl+C in server clients. See Ending the session cleanly.

Step 2 is the whole binding — send the agent_id and nothing else:

{
  "type": "session.update",
  "session": { "agent_id": "7ad24396-b822-4dca-871a-be9cc4781cf9" }
}

agent_id is mutually exclusive with inline session fields. When you bind to a stored agent, don’t also send system_prompt, greeting, tools, input, or output; those are rejected. To override config per session instead, send those fields inline and omit agent_id. See Inline configuration.

Connect from a server or native app

For server-side apps, backends, and native desktop clients, connect directly with your API key in the Authorization header (raw key; a Bearer prefix is also accepted). Set ASSEMBLYAI_API_KEY and AGENT_ID in your environment first. Both clients below are complete: they capture your microphone, stream it to the agent bound by agent_id, play the agent’s replies, flush playback on barge-in, and end the session cleanly on Ctrl+C.

# pip install websockets sounddevice numpy
import asyncio
import base64
import json
import os
import signal

import numpy as np
import sounddevice as sd
import websockets

URL = "wss://agents.assemblyai.com/v1/ws"
API_KEY = os.environ["ASSEMBLYAI_API_KEY"]
AGENT_ID = os.environ.get("AGENT_ID", "your-agent-id-here")

SAMPLE_RATE = 24000   # PCM16 mono @ 24kHz is the audio/pcm default
CHANNELS = 1
BLOCKSIZE = 1200      # frames per chunk = 50ms at 24kHz


async def main():
    # Auth is a raw API key in the Authorization header — no "Bearer " prefix.
    async with websockets.connect(
        URL, additional_headers={"Authorization": API_KEY}
    ) as ws:
        # Bind this connection to a stored agent by id (no inline prompt/voice).
        await ws.send(json.dumps(
            {"type": "session.update", "session": {"agent_id": AGENT_ID}}
        ))

        # Block until the server confirms the session before streaming audio.
        while True:
            msg = json.loads(await ws.recv())
            if msg.get("type") == "session.ready":
                print(f"Session ready: {msg.get('session_id')}")
                break
            if msg.get("type") in ("session.error", "error"):
                print("Error:", msg.get("message", msg))
                return

        loop = asyncio.get_running_loop()
        mic_queue: asyncio.Queue[bytes] = asyncio.Queue()

        # Output stream for agent audio; kept open so we can flush on barge-in.
        out_stream = sd.OutputStream(
            samplerate=SAMPLE_RATE, channels=CHANNELS, dtype="int16"
        )
        out_stream.start()

        def mic_callback(indata, frames, time_info, status):
            # Runs on sounddevice's thread — hand bytes to the event loop safely.
            data = bytes(indata)
            loop.call_soon_threadsafe(mic_queue.put_nowait, data)

        in_stream = sd.InputStream(
            samplerate=SAMPLE_RATE,
            channels=CHANNELS,
            dtype="int16",
            blocksize=BLOCKSIZE,
            callback=mic_callback,
        )

        stop = asyncio.Event()

        async def send_audio():
            while not stop.is_set():
                chunk = await mic_queue.get()
                # Each mic chunk is base64-encoded PCM16 bytes.
                await ws.send(json.dumps({
                    "type": "input.audio",
                    "audio": base64.b64encode(chunk).decode("ascii"),
                }))

        async def receive():
            async for raw in ws:
                msg = json.loads(raw)
                mtype = msg.get("type")
                if mtype == "reply.audio":
                    pcm = base64.b64decode(msg["data"])
                    out_stream.write(np.frombuffer(pcm, dtype=np.int16))
                elif mtype == "transcript.user":
                    print("You:", msg.get("text", ""))
                elif mtype == "transcript.agent":
                    print("Agent:", msg.get("text", ""))
                elif mtype == "reply.done":
                    # On barge-in, drop queued playback so stale audio stops.
                    if msg.get("status") == "interrupted":
                        out_stream.abort()
                        out_stream.start()
                elif mtype == "session.ended":
                    stop.set()
                    break
                elif mtype in ("session.error", "error"):
                    print("Error:", msg.get("message", msg))

        # Trip `stop` on Ctrl+C so we can end the session gracefully.
        loop.add_signal_handler(signal.SIGINT, stop.set)

        with in_stream:
            tasks = [asyncio.create_task(send_audio()),
                     asyncio.create_task(receive())]
            await stop.wait()
            for t in tasks:
                t.cancel()

        # Politely end the session and give the server a moment to confirm.
        try:
            await ws.send(json.dumps({"type": "session.end"}))
            await asyncio.wait_for(ws.recv(), timeout=2.0)
        except (asyncio.TimeoutError, websockets.ConnectionClosed):
            pass

        out_stream.stop()
        out_stream.close()


if __name__ == "__main__":
    try:
        asyncio.run(main())
    except KeyboardInterrupt:
        pass

// npm i ws mic speaker   (mic needs SoX on your PATH: brew install sox / apt install sox)
const WebSocket = require("ws");
const mic = require("mic");
const Speaker = require("speaker");

const URL = "wss://agents.assemblyai.com/v1/ws";
const API_KEY = process.env.ASSEMBLYAI_API_KEY;
const AGENT_ID = process.env.AGENT_ID || "your-agent-id-here";

const SAMPLE_RATE = 24000; // PCM16 mono @ 24kHz is the audio/pcm default
const CHANNELS = 1;

// Raw API key in the Authorization header — no "Bearer " prefix.
const ws = new WebSocket(URL, { headers: { Authorization: API_KEY } });

let ready = false; // gate: don't send mic audio until session.ready
let speaker = newSpeaker();

function newSpeaker() {
  return new Speaker({ channels: CHANNELS, bitDepth: 16, sampleRate: SAMPLE_RATE });
}

ws.on("open", () => {
  // Bind this connection to a stored agent by id (no inline prompt/voice).
  ws.send(JSON.stringify({ type: "session.update", session: { agent_id: AGENT_ID } }));
});

// Capture mic as PCM16 mono @ 24kHz.
const micInstance = mic({
  rate: String(SAMPLE_RATE),
  channels: String(CHANNELS),
  bitwidth: "16",
  encoding: "signed-integer",
  endian: "little",
});
const micStream = micInstance.getAudioStream();

micStream.on("data", (chunk) => {
  if (!ready || ws.readyState !== WebSocket.OPEN) return;
  // Each mic chunk is base64-encoded PCM16 bytes.
  ws.send(JSON.stringify({ type: "input.audio", audio: chunk.toString("base64") }));
});
micStream.on("error", (err) => console.error("Mic error:", err.message));

ws.on("message", (raw) => {
  const msg = JSON.parse(raw.toString());
  switch (msg.type) {
    case "session.ready":
      console.log(`Session ready: ${msg.session_id}`);
      ready = true; // open the gate: start streaming mic audio now
      micInstance.start();
      break;
    case "reply.audio":
      speaker.write(Buffer.from(msg.data, "base64")); // decode PCM16 and play
      break;
    case "transcript.user":
      console.log("You:", msg.text || "");
      break;
    case "transcript.agent":
      console.log("Agent:", msg.text || "");
      break;
    case "reply.done":
      if (msg.status === "interrupted") {
        // Barge-in: destroy and recreate the Speaker to flush stale audio.
        speaker.removeAllListeners("error");
        speaker.on("error", () => {});
        speaker.end();
        speaker = newSpeaker();
      }
      break;
    case "session.ended":
      cleanup();
      break;
    case "session.error":
    case "error":
      console.error("Error:", msg.message || JSON.stringify(msg));
      break;
  }
});

ws.on("error", (err) => console.error("WebSocket error:", err.message));
ws.on("close", () => process.exit(0));

let ending = false;
function cleanup() {
  try { micInstance.stop(); } catch {}
  try { speaker.end(); } catch {}
  try { ws.close(); } catch {}
  setTimeout(() => process.exit(0), 200);
}

process.on("SIGINT", () => {
  if (ending) return process.exit(0);
  ending = true;
  if (ws.readyState === WebSocket.OPEN) {
    ws.send(JSON.stringify({ type: "session.end" }));
    setTimeout(cleanup, 1000); // brief wait for session.ended, then close
  } else {
    cleanup();
  }
});

This passes your raw API key over the connection, which is fine for servers and trusted native clients. Never ship your API key in browser or mobile client code. For client-side apps, use the browser integration token flow.

Deploy on another channel

The lifecycle above is identical everywhere — only how you authenticate and move audio changes:

From a browser — mint a short-lived token server-side so no API key is exposed; the browser handles mic capture and echo cancellation.
Over the phone (Twilio) — bridge a Twilio call to your agent over G.711 μ-law with zero transcoding.

Next steps

Stream & play audio

Encodings, sending input, and playing output with clean interruptions.

WebSocket events

Every event and payload, with the session flow diagram.

Best practices

Tune turn-taking, latency, and reliability once it works.

Troubleshooting

Symptom-to-fix table for the common failures.

Getting started

Create & manage agents

Agent behavior

Conversational experience

Deploy

After a session

Reference

API reference

The connection lifecycle

Connect from a server or native app

Deploy on another channel

Next steps

Stream & play audio

WebSocket events

Best practices

Troubleshooting

​The connection lifecycle

​Connect from a server or native app

​Deploy on another channel

​Next steps

Stream & play audio

WebSocket events

Best practices

Troubleshooting

The connection lifecycle

Connect from a server or native app

Deploy on another channel

Next steps