Voice Agent API

Overview

By the end of this guide you’ll have a voice agent you can talk to right in your browser. You create it with one command, then drop it into a small web page. Browsers handle echo cancellation, so you don’t need headphones, and there’s nothing to install. Prefer to try it first? Talk to an agent without writing any code in the Voice Agent playground.

Voice agents are billed per session. Always end with session.end.Billing runs for as long as the WebSocket stays open. If the client just calls ws.close(), the server keeps the session alive for a 30-second session.resume grace window, and that window is billable. See Step 5: End the session cleanly below.New accounts get $50 in free credits. See Billing and pricing.

Before you begin

You’ll need:

An API key. Grab one from your dashboard. The curl below reads it from an environment variable:
export ASSEMBLYAI_API_KEY=<your-key>
A modern browser (Chrome or Edge recommended). That’s the whole client: echo cancellation for free, nothing else to install.

Build your agent

Create an agent, then talk to it in a tiny web page. Four steps, no dependencies.

Prefer to build with an AI coding tool? Claude Code, Cursor, Windsurf, and prompt-based builders like Lovable and v0 can write this for you against AssemblyAI’s live docs. See Build with AI coding tools.

Step 1: Create an agent

curl -X POST https://agents.assemblyai.com/v1/agents \
  -H "Authorization: $ASSEMBLYAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Quickstart Assistant",
    "system_prompt": "You are a friendly assistant having a casual voice conversation. Keep replies short and natural.",
    "greeting": "Hey there, what can I help with?",
    "voice": { "voice_id": "alba" }
  }'

# pip install requests
import os
import requests

resp = requests.post(
    "https://agents.assemblyai.com/v1/agents",
    headers={"Authorization": os.environ["ASSEMBLYAI_API_KEY"]},
    json={
        "name": "Quickstart Assistant",
        "system_prompt": "You are a friendly assistant having a casual voice conversation. Keep replies short and natural.",
        "greeting": "Hey there, what can I help with?",
        "voice": {"voice_id": "alba"},
    },
)
resp.raise_for_status()
print(resp.json())

// Node 18+ has fetch built in
const res = await fetch("https://agents.assemblyai.com/v1/agents", {
  method: "POST",
  headers: {
    Authorization: process.env.ASSEMBLYAI_API_KEY,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    name: "Quickstart Assistant",
    system_prompt: "You are a friendly assistant having a casual voice conversation. Keep replies short and natural.",
    greeting: "Hey there, what can I help with?",
    voice: { voice_id: "alba" },
  }),
});
const data = await res.json();
console.log(data);

Copy the id from the response. This is your agent ID:

{ "id": "7ad24396-b822-4dca-871a-be9cc4781cf9", "name": "Quickstart Assistant", "...": "..." }

Step 2: Save the web page

Save this as voice-agent.html. It’s the whole client: it captures the mic, streams it to your agent, plays the reply, and barges in when you start talking:

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8" />
  <title>Voice Agent</title>
  <style>
    body { font-family: system-ui, sans-serif; background: #0b1020; color: #e8ecf5;
           min-height: 100vh; margin: 0; display: grid; place-items: center; }
    .card { width: 320px; background: #151b30; padding: 28px; border-radius: 16px;
            box-shadow: 0 12px 40px rgba(0,0,0,.45); }
    h1 { font-size: 18px; margin: 0 0 18px; }
    input { width: 100%; box-sizing: border-box; padding: 10px 12px; margin-bottom: 10px;
            border-radius: 8px; border: 1px solid #2a3350; background: #0e1426; color: #e8ecf5; }
    button { width: 100%; padding: 12px; border: 0; border-radius: 8px; font-size: 15px;
             font-weight: 600; cursor: pointer; background: #4f7cff; color: #fff; }
    button.live { background: #e0455e; }
    .status { margin-top: 14px; text-align: center; font-size: 14px; color: #9fb0d0; }
  </style>
</head>
<body>
  <div class="card">
    <h1>🎙️ Voice Agent</h1>
    <input id="key" type="password" placeholder="AssemblyAI API key" />
    <input id="agent" placeholder="Agent ID" />
    <button id="btn">Connect</button>
    <div class="status" id="status">Enter your key and agent ID</div>
  </div>
  <script>
    const RATE = 24000, $ = (id) => document.getElementById(id);
    let ws, ctx, stream, playhead = 0; const sources = new Set();
    const setStatus = (t) => ($("status").textContent = t);

    async function start() {
      const key = $("key").value.trim(), agent = $("agent").value.trim();
      if (!key || !agent) return setStatus("Enter your key and agent ID");
      setStatus("connecting…"); $("btn").textContent = "Stop"; $("btn").classList.add("live");

      ctx = new AudioContext({ sampleRate: RATE });
      stream = await navigator.mediaDevices.getUserMedia({
        audio: { echoCancellation: true, noiseSuppression: false },
      });
      const cap = `class P extends AudioWorkletProcessor{process(i){const c=i[0][0];
        if(c){const b=new Int16Array(c.length);for(let n=0;n<c.length;n++)
        b[n]=Math.max(-1,Math.min(1,c[n]))*32767;this.port.postMessage(b.buffer,[b.buffer]);}
        return true;}}registerProcessor("cap",P);`;
      await ctx.audioWorklet.addModule(URL.createObjectURL(new Blob([cap], { type: "text/javascript" })));
      const node = new AudioWorkletNode(ctx, "cap");
      ctx.createMediaStreamSource(stream).connect(node);

      const url = new URL("wss://agents.assemblyai.com/v1/ws");
      url.searchParams.set("token", key);
      ws = new WebSocket(url);
      let ready = false;

      ws.onopen = () => ws.send(JSON.stringify({ type: "session.update", session: { agent_id: agent } }));
      node.port.onmessage = (e) => {
        if (!ready || ws.readyState !== 1) return;
        const b = new Uint8Array(e.data); let s = "";
        for (let i = 0; i < b.length; i++) s += String.fromCharCode(b[i]);
        ws.send(JSON.stringify({ type: "input.audio", audio: btoa(s) }));
      };
      ws.onmessage = ({ data }) => {
        const m = JSON.parse(data);
        if (m.type === "session.ready") { ready = true; playhead = ctx.currentTime; setStatus("● listening, start talking"); }
        else if (m.type === "input.speech.started") flush();        // barge-in
        else if (m.type === "reply.audio") play(m.data);
        else if (m.type === "transcript.agent") setStatus("🗣 " + m.text);
        else if (m.type === "error" || m.type === "session.error") setStatus("error: " + (m.message || ""));
      };
    }

    function play(b64) {
      const raw = atob(b64), pcm = new Int16Array(raw.length / 2);
      for (let i = 0; i < pcm.length; i++) pcm[i] = raw.charCodeAt(2*i) | (raw.charCodeAt(2*i+1) << 8);
      const buf = ctx.createBuffer(1, pcm.length, RATE), ch = buf.getChannelData(0);
      for (let i = 0; i < pcm.length; i++) ch[i] = pcm[i] / 32768;
      const src = ctx.createBufferSource(); src.buffer = buf; src.connect(ctx.destination);
      const at = Math.max(ctx.currentTime, playhead); src.start(at); playhead = at + buf.duration;
      sources.add(src); src.onended = () => sources.delete(src);
    }
    function flush() { for (const s of sources) { try { s.stop(); } catch (e) {} } sources.clear(); playhead = ctx.currentTime; }
    function stop() {
      ws && ws.close(); stream && stream.getTracks().forEach((t) => t.stop()); ctx && ctx.close();
      $("btn").textContent = "Connect"; $("btn").classList.remove("live"); setStatus("disconnected");
    }
    $("btn").onclick = () => (ws && ws.readyState <= 1 ? stop() : start());
  </script>
</body>
</html>

For simplicity this passes your API key straight to the browser. Never do that in production. Mint a temporary token on your server instead.

Step 3: Open it and talk

Browsers need a secure context for the microphone, so serve the file locally:

npx serve .

Open http://localhost:3000/voice-agent.html, paste your API key and the agent ID from step 1, and click Connect, then start talking. The agent greets you, listens, and replies.

Step 4: Add a tool (optional)

Give the agent the ability to do something. Here it books a meeting with the real Cal.com bookings API. Update the agent with an HTTP tool: you provide the endpoint and a JSON-Schema parameter spec, and AssemblyAI calls it for you whenever the model decides to. Nothing changes in your web page. Replace <YOUR_CAL_API_KEY> with a Cal.com API key and 123 with your event type ID:

curl -X PUT https://agents.assemblyai.com/v1/agents/7ad24396-b822-4dca-871a-be9cc4781cf9 \
  -H "Authorization: $ASSEMBLYAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "system_prompt": "You are a friendly scheduling assistant. When the caller wants to meet, collect their name, email, time zone, and a start time, then call create_booking with eventTypeId 123 and confirm the booking out loud.",
    "tools": [
      {
        "name": "create_booking",
        "description": "Book a meeting on Cal.com. Call this once you have the caller name, email, time zone, and a start time.",
        "parameters": {
          "type": "object",
          "properties": {
            "start":       { "type": "string", "format": "date-time", "description": "Meeting start, ISO 8601 in UTC.", "examples": ["2026-06-15T15:00:00Z"] },
            "eventTypeId": { "type": "integer", "description": "The Cal.com event type to book. Always use 123.", "examples": [123] },
            "attendee": {
              "type": "object",
              "description": "The caller's details.",
              "properties": {
                "name":     { "type": "string", "description": "The caller full name." },
                "email":    { "type": "string", "format": "email", "description": "The caller email address." },
                "timeZone": { "type": "string", "description": "IANA time zone, e.g. America/New_York." }
              },
              "required": ["name", "email", "timeZone"]
            }
          },
          "required": ["start", "eventTypeId", "attendee"]
        },
        "execution_mode": "interactive",
        "timeout_seconds": 30,
        "http": {
          "url": "https://api.cal.com/v2/bookings",
          "http_method": "POST",
          "headers": [
            { "name": "Authorization", "value": "Bearer <YOUR_CAL_API_KEY>" },
            { "name": "cal-api-version", "value": "2026-02-25" }
          ]
        }
      }
    ]
  }'

# pip install requests
import os
import requests

resp = requests.put(
    "https://agents.assemblyai.com/v1/agents/7ad24396-b822-4dca-871a-be9cc4781cf9",
    headers={"Authorization": os.environ["ASSEMBLYAI_API_KEY"]},
    json={
        "system_prompt": "You are a friendly scheduling assistant. When the caller wants to meet, collect their name, email, time zone, and a start time, then call create_booking with eventTypeId 123 and confirm the booking out loud.",
        "tools": [
            {
                "name": "create_booking",
                "description": "Book a meeting on Cal.com. Call this once you have the caller name, email, time zone, and a start time.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "start": {"type": "string", "format": "date-time", "description": "Meeting start, ISO 8601 in UTC.", "examples": ["2026-06-15T15:00:00Z"]},
                        "eventTypeId": {"type": "integer", "description": "The Cal.com event type to book. Always use 123.", "examples": [123]},
                        "attendee": {
                            "type": "object",
                            "description": "The caller's details.",
                            "properties": {
                                "name": {"type": "string", "description": "The caller full name."},
                                "email": {"type": "string", "format": "email", "description": "The caller email address."},
                                "timeZone": {"type": "string", "description": "IANA time zone, e.g. America/New_York."},
                            },
                            "required": ["name", "email", "timeZone"],
                        },
                    },
                    "required": ["start", "eventTypeId", "attendee"],
                },
                "execution_mode": "interactive",
                "timeout_seconds": 30,
                "http": {
                    "url": "https://api.cal.com/v2/bookings",
                    "http_method": "POST",
                    "headers": [
                        {"name": "Authorization", "value": "Bearer <YOUR_CAL_API_KEY>"},
                        {"name": "cal-api-version", "value": "2026-02-25"},
                    ],
                },
            }
        ],
    },
)
resp.raise_for_status()
print(resp.json())

// Node 18+ has fetch built in
const res = await fetch(
  "https://agents.assemblyai.com/v1/agents/7ad24396-b822-4dca-871a-be9cc4781cf9",
  {
    method: "PUT",
    headers: {
      Authorization: process.env.ASSEMBLYAI_API_KEY,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      system_prompt: "You are a friendly scheduling assistant. When the caller wants to meet, collect their name, email, time zone, and a start time, then call create_booking with eventTypeId 123 and confirm the booking out loud.",
      tools: [
        {
          name: "create_booking",
          description: "Book a meeting on Cal.com. Call this once you have the caller name, email, time zone, and a start time.",
          parameters: {
            type: "object",
            properties: {
              start: { type: "string", format: "date-time", description: "Meeting start, ISO 8601 in UTC.", examples: ["2026-06-15T15:00:00Z"] },
              eventTypeId: { type: "integer", description: "The Cal.com event type to book. Always use 123.", examples: [123] },
              attendee: {
                type: "object",
                description: "The caller's details.",
                properties: {
                  name: { type: "string", description: "The caller full name." },
                  email: { type: "string", format: "email", description: "The caller email address." },
                  timeZone: { type: "string", description: "IANA time zone, e.g. America/New_York." },
                },
                required: ["name", "email", "timeZone"],
              },
            },
            required: ["start", "eventTypeId", "attendee"],
          },
          execution_mode: "interactive",
          timeout_seconds: 30,
          http: {
            url: "https://api.cal.com/v2/bookings",
            http_method: "POST",
            headers: [
              { name: "Authorization", value: "Bearer <YOUR_CAL_API_KEY>" },
              { name: "cal-api-version", value: "2026-02-25" },
            ],
          },
        },
      ],
    }),
  },
);
const data = await res.json();
console.log(data);

Reload the page, connect again, and ask to book a meeting. The agent collects the details and POSTs them to Cal.com; your page never handles the request. The model’s arguments become the JSON body, and the header values are encrypted at rest (reads return only the header name, never the value). See Add tools for the full tool model.

Step 5: End the session cleanly

The Stop button in step 2 calls ws.close() directly. That leaves the session in a 30-second session.resume grace window on the server, and that window is billable. Send session.end first, wait for session.ended, then close. Update stop() in voice-agent.html:

function stop() {
  if (ws && ws.readyState === 1) {
    ws.send(JSON.stringify({ type: "session.end" }));
    // Server emits session.ended and closes the socket. Handle it in ws.onmessage.
  } else {
    cleanup();
  }
}

function cleanup() {
  ws && ws.close();
  stream && stream.getTracks().forEach((t) => t.stop());
  ctx && ctx.close();
  $("btn").textContent = "Connect"; $("btn").classList.remove("live"); setStatus("disconnected");
}

Then handle session.ended inside your existing ws.onmessage:

ws.onmessage = ({ data }) => {
  const m = JSON.parse(data);
  // ...existing handlers...
  if (m.type === "session.ended") cleanup();
};

Also cover tab close and navigation. pagehide fires on both and is more reliable than beforeunload on mobile Safari:

window.addEventListener("pagehide", () => {
  if (ws && ws.readyState === 1) {
    ws.send(JSON.stringify({ type: "session.end" }));
  }
});

Send session.end synchronously inside the pagehide handler. Anything async (like await fetch) will not finish before the socket is torn down.

Server and native clients follow the same pattern: send session.end on your shutdown signal (SIGINT, SIGTERM, or your own hangup handler), wait briefly for session.ended, then close the socket. There’s a complete example in the Deploy your agent quickstart. See Unexpected billing after the call ended if your sessions look about 30 seconds longer than the call itself.

Next steps

You just did all three phases at once. Here’s where to go deeper on each:

1. Create

Create a reusable agent with one REST call, then update, list, and delete it.

2. Configure

Shape how it sounds and behaves: prompt, voice, greeting, audio, turn detection, keyterms, tools.

3. Deploy

Connect by agent_id over the API, from a browser, or to a phone number with Twilio.

Or jump straight to a topic:

Add tools: server-side HTTP tools and client-side function tools
Connect your own LLM: point the agent at an OpenAI-compatible model
Manage agents (REST): every endpoint, field, and validation rule
Events reference: every WebSocket event with full payloads
Troubleshooting: symptom-to-fix table and support logging

Getting started

Create & manage agents

Agent behavior

Conversational experience

Deploy

After a session

Reference

API reference

Overview

Before you begin

Build your agent

Step 1: Create an agent

Step 2: Save the web page

Step 3: Open it and talk

Step 4: Add a tool (optional)

Step 5: End the session cleanly

Next steps

1. Create

2. Configure

3. Deploy

​Overview

​Before you begin

​Build your agent

​Step 1: Create an agent

​Step 2: Save the web page

​Step 3: Open it and talk

​Step 4: Add a tool (optional)

​Step 5: End the session cleanly

​Next steps

1. Create

2. Configure

3. Deploy

Overview

Before you begin

Build your agent

Step 1: Create an agent

Step 2: Save the web page

Step 3: Open it and talk

Step 4: Add a tool (optional)

Step 5: End the session cleanly

Next steps