Voice Agent API

Browser integration

Connect browser-based apps to the Voice Agent API using a temporary token.

Connect a browser to the Voice Agent API in two steps:

  1. Your server calls GET /v1/token with your API key to mint a short-lived temporary token.
  2. Your browser opens the WebSocket with ?token=<token> — no API key exposed.

Your API key never leaves your server. Each token is single-use — it starts exactly one session, and all usage is attributed to the key that generated it.

1. Generate a token on your server

Call GET /v1/token with your API key in the Authorization header. Pick an expires_in_seconds short enough to limit replay risk (60–300s is a good default) and an optional max_session_duration_seconds to cap the session length.

GET
/v1/token
1curl -G https://agents.assemblyai.com/v1/token \
2 -H "Authorization: <apiKey>" \
3 -d expires_in_seconds=300
1// server/routes/voice-token.js
2import express from "express";
3
4const router = express.Router();
5
6router.get("/voice-token", async (_req, res) => {
7 const url = new URL("https://agents.assemblyai.com/v1/token");
8 url.searchParams.set("expires_in_seconds", "300");
9 url.searchParams.set("max_session_duration_seconds", "8640");
10
11 const response = await fetch(url, {
12 headers: { Authorization: `Bearer ${process.env.ASSEMBLYAI_API_KEY}` },
13 });
14
15 if (!response.ok) {
16 return res.status(response.status).send(await response.text());
17 }
18
19 const { token } = await response.json();
20 res.json({ token });
21});
22
23export default router;

expires_in_seconds must be between 1 and 600. max_session_duration_seconds must be between 60 and 10800 (defaults to 10800, the 3-hour maximum session duration).

2. Connect from the browser with the token

Fetch the token from your server, then open the WebSocket with ?token=<token>. No Authorization header is needed.

1// browser/voice-agent.js
2const { token } = await fetch("/api/voice-token").then((r) => r.json());
3
4const wsUrl = new URL("wss://agents.assemblyai.com/v1/voice");
5wsUrl.searchParams.set("token", token);
6const ws = new WebSocket(wsUrl);
7
8ws.addEventListener("open", () => {
9 ws.send(
10 JSON.stringify({
11 type: "session.update",
12 session: {
13 system_prompt: "You are a helpful voice assistant.",
14 greeting: "Hi there! How can I help you today?",
15 output: { voice: "claire" },
16 },
17 }),
18 );
19});
20
21ws.addEventListener("message", (event) => {
22 const message = JSON.parse(event.data);
23 // Handle session.ready, reply.audio, transcript.*, tool.call, etc.
24 console.log(message);
25});

See the Overview quickstart for the full event loop, and Audio format for capturing mic input and playing the agent’s response in a browser.

Fetch a fresh token for every new WebSocket connection. Tokens are single-use — a dropped connection needs a new token to reconnect (including when using session.resume).