Quickstart

Overview

By the end of this guide, you’ll have a working script that transcribes a live internet radio stream, printing each turn as it plays. No microphone required, so it runs anywhere, including a coding agent’s sandbox. Build it with an AI coding agent, or write it yourself. Both are below. Prefer to try it first? Transcribe audio without writing any code in the AssemblyAI Playground.

Streaming is billed per sessionStreaming Speech-to-Text is billed on the total duration that your WebSocket connection stays open, not on the amount of audio you send. Always send a termination message when you’re done with a stream — sessions that aren’t closed auto-close after 3 hours and are billed for the full duration. See Billing and pricing for details.

Before you begin

You’ll need:

An API key — grab one from your dashboard. Every example below reads it from an environment variable, so set it once:
export ASSEMBLYAI_API_KEY=<your-key>
Python 3.8+ or Node.js 18+, depending on which SDK you use.
Nothing else. The examples pull a public AAC radio stream over HTTP, so no microphone or audio hardware is needed.

Building with an AI coding agent? Wire it up to AssemblyAI’s live docs (MCP server) and the AssemblyAI skill so it writes correct, up-to-date code instead of relying on stale training data:

claude mcp add --transport http --scope user assemblyai-docs https://assemblyai.com/docs/mcp
npx skills add AssemblyAI/assemblyai-skill --global

Then describe what you want to build. To get the same result as the steps below, paste:

Use the AssemblyAI Python SDK to transcribe a live AAC radio stream in real time and print each turn.

Transcribe streaming audio

Prefer to write it yourself? Follow these steps to stream a live radio URL. The AssemblyAI SDK manages the WebSocket connection and session termination for you.

Step 1: Install the SDK

Python SDK
JavaScript SDK

pip install assemblyai requests

npm install assemblyai

Step 2: Stream your first session

Save this as transcribe.py (Python) or transcribe.js (JavaScript). It streams about 25 seconds of a live radio URL, prints each turn, then closes the session:

Python SDK
JavaScript SDK

import os
import time

import requests
from assemblyai.streaming.v3 import (
    BeginEvent,
    Encoding,
    StreamingClient,
    StreamingClientOptions,
    StreamingError,
    StreamingEvents,
    StreamingParameters,
    TerminationEvent,
    TurnEvent,
)

# A live AAC (ADTS) internet radio stream, so no microphone is needed.
STREAM_URL = "https://14123.live.streamtheworld.com/WBBRAMAAC.aac"
RUN_SECONDS = 25


def on_begin(client: StreamingClient, event: BeginEvent):
    print(f"Session started: {event.id}")
    print("Connected. Streaming live radio for ~25 seconds.")


def on_turn(client: StreamingClient, event: TurnEvent):
    if event.transcript:
        print(event.transcript)


def on_terminated(client: StreamingClient, event: TerminationEvent):
    print(f"Session terminated: {event.audio_duration_seconds}s of audio processed")


def on_error(client: StreamingClient, error: StreamingError):
    print(f"Error: {error}")


def main():
    client = StreamingClient(
        StreamingClientOptions(
            api_key=os.environ["ASSEMBLYAI_API_KEY"],
            terminate_timeout=30.0,
        )
    )

    client.on(StreamingEvents.Begin, on_begin)
    client.on(StreamingEvents.Turn, on_turn)
    client.on(StreamingEvents.Termination, on_terminated)
    client.on(StreamingEvents.Error, on_error)

    # AAC is self-describing (ADTS headers carry the sample rate)
    client.connect(
        StreamingParameters(speech_model="universal-3-5-pro", encoding=Encoding.aac)
    )

    # Pull the live radio stream and forward each chunk to the transcriber.
    response = requests.get(STREAM_URL, stream=True)
    deadline = time.time() + RUN_SECONDS
    try:
        for chunk in response.iter_content(chunk_size=4096):
            client.stream(chunk)
            if time.time() > deadline:
                break
    finally:
        response.close()
        # Terminate finalizes the open turn.
        # Keep the connection open long enough to receive the last final.
        client.disconnect(terminate=True)


if __name__ == "__main__":
    main()

import { AssemblyAI } from "assemblyai";

// A live AAC (ADTS) internet radio stream, so no microphone is needed.
const STREAM_URL = "https://14123.live.streamtheworld.com/WBBRAMAAC.aac";
const RUN_MS = 25_000;

const client = new AssemblyAI({ apiKey: process.env.ASSEMBLYAI_API_KEY });

const transcriber = client.streaming.transcriber({
  speechModel: "universal-3-5-pro",
  encoding: "aac",
});

transcriber.on("open", ({ id }) => console.log(`Session opened with ID: ${id}`));
transcriber.on("error", (error) => console.error("Error:", error));
transcriber.on("close", (code, reason) => console.log("Session closed:", code, reason));
transcriber.on("turn", (turn) => {
  if (turn.transcript) {
    console.log("Turn:", turn.transcript);
  }
});

const run = async () => {
  await transcriber.connect();
  console.log("Connected. Streaming live radio for ~25 seconds.");

  // Pull the live radio stream and forward each chunk to the transcriber.
  const response = await fetch(STREAM_URL);
  const reader = response.body.getReader();
  const deadline = Date.now() + RUN_MS;
  while (Date.now() < deadline) {
    const { value, done } = await reader.read();
    if (done) break;
    transcriber.sendAudio(Buffer.from(value));
  }
  await reader.cancel();

  // close() sends terminate, which finalizes the open turn.
  // Keep the connection open long enough to receive the last final.
  await transcriber.close();
};

run();

Then run it with python transcribe.py or node transcribe.js. Each turn prints as the audio plays. Terminating finalizes the open turn, so its final transcript prints just before the session ends after about 25 seconds. Because internet radio bursts audio faster than real time, the server may still be catching up when you stop sending, so the last final can arrive a few seconds later than a real-time source like a microphone would show it:

Session started: 7f3a9c2e-...
markets are higher across the board this morning as investors weigh the latest earnings.
Session terminated: 25.0s of audio processed

That’s a full real-time transcriber. Prefer raw WebSockets? See Using the WebSocket API directly below.

What you get back

The transcriber emits JSON messages (the SDK surfaces them as open / turn / close events). The one you handle most is Turn, sent repeatedly as someone speaks — end_of_turn: true marks a finalized turn, and transcript is the text so far:

{
  "type": "Turn",
  "turn_order": 0,
  "end_of_turn": true,
  "turn_is_formatted": true,
  "end_of_turn_confidence": 1.0,
  "transcript": "Smoke from hundreds of wildfires in Canada is triggering air quality alerts...",
  "words": [
    { "text": "Smoke", "start": 0, "end": 399, "confidence": 0.99, "word_is_final": true }
  ]
}

Live radio is continuous speech, so finalized turns arrive at natural pauses. Expect several partial updates before each end_of_turn: true. You also receive a Begin message when the session opens ({ "type": "Begin", "id": "...", "expires_at": ... }) and a Termination message when it closes ({ "type": "Termination", "audio_duration_seconds": 10, "session_duration_seconds": 12 }). Word timings are in milliseconds. See the message sequence breakdown for the full event flow.

Using the WebSocket API directly

Not using an SDK? Connect to the streaming WebSocket at wss://streaming.assemblyai.com/v3/ws directly. Authenticate with your key in the Authorization header (no Bearer prefix), and manage the connection, the audio source, the Begin / Turn / Termination messages, and session termination yourself. The SDK above does all of this for you. See the message sequence breakdown for the event flow and endpoints and data zones for regional endpoints. Both examples read your key from the same ASSEMBLYAI_API_KEY environment variable you set in Before you begin.

Streaming from a browser?Don’t ship your API key to client-side code. Authenticate from the browser with a short-lived temporary token instead.

Python
JavaScript

pip install requests websocket-client

import json
import os
import threading
import time
from urllib.parse import urlencode

import requests
import websocket

API_KEY = os.environ["ASSEMBLYAI_API_KEY"]
# A live AAC (ADTS) internet radio stream, so no microphone is needed.
STREAM_URL = "https://14123.live.streamtheworld.com/WBBRAMAAC.aac"
RUN_SECONDS = 25
# AAC is self-describing (ADTS headers carry the sample rate)
CONNECTION_PARAMS = {"speech_model": "universal-3-5-pro", "encoding": "aac"}
API_ENDPOINT = f"wss://streaming.assemblyai.com/v3/ws?{urlencode(CONNECTION_PARAMS)}"

stop = threading.Event()


def on_open(ws):
    print("Connected. Streaming live radio for ~25 seconds.")

    def stream_audio():
        # Pull the live radio stream and forward each chunk as a binary frame.
        response = requests.get(STREAM_URL, stream=True)
        deadline = time.time() + RUN_SECONDS
        try:
            for chunk in response.iter_content(chunk_size=4096):
                if stop.is_set() or time.time() > deadline:
                    break
                ws.send(chunk, websocket.ABNF.OPCODE_BINARY)
        finally:
            response.close()
            if ws.sock and ws.sock.connected:
                # Terminate finalizes the open turn.
                # Keep the connection open long enough to receive the last final.
                ws.send(json.dumps({"type": "Terminate"}))

    threading.Thread(target=stream_audio, daemon=True).start()


def on_message(ws, message):
    data = json.loads(message)
    if data.get("type") == "Turn":
        print(data.get("transcript", ""), end="\n" if data.get("end_of_turn") else "\r")


def on_error(ws, error):
    # On a normal shutdown, websocket-client hands the server's close frame to
    # on_error; ignore it and let on_close report the disconnect. Real failures
    # arrive as exceptions, not close frames.
    if isinstance(error, websocket.ABNF) and error.opcode == websocket.ABNF.OPCODE_CLOSE:
        return
    print(f"\nError: {error}")
    stop.set()


def on_close(ws, status, msg):
    stop.set()
    print("\nDisconnected.")


def main():
    ws = websocket.WebSocketApp(
        API_ENDPOINT,
        header={"Authorization": API_KEY},
        on_open=on_open,
        on_message=on_message,
        on_error=on_error,
        on_close=on_close,
    )

    ws_thread = threading.Thread(target=ws.run_forever, daemon=True)
    ws_thread.start()

    try:
        while ws_thread.is_alive():
            ws_thread.join(0.1)
    except KeyboardInterrupt:
        stop.set()
        if ws.sock and ws.sock.connected:
            ws.send(json.dumps({"type": "Terminate"}))  # close the session
        ws.close()


if __name__ == "__main__":
    main()

npm install ws

const WebSocket = require("ws");
const querystring = require("querystring");

const API_KEY = process.env.ASSEMBLYAI_API_KEY;
// A live AAC (ADTS) internet radio stream, so no microphone is needed.
const STREAM_URL = "https://14123.live.streamtheworld.com/WBBRAMAAC.aac";
const RUN_MS = 25_000;
// AAC is self-describing (ADTS headers carry the sample rate)
const params = { speech_model: "universal-3-5-pro", encoding: "aac" };
const endpoint = `wss://streaming.assemblyai.com/v3/ws?${querystring.stringify(params)}`;

const ws = new WebSocket(endpoint, { headers: { Authorization: API_KEY } });

ws.on("open", async () => {
  console.log("Connected. Streaming live radio for ~25 seconds.");

  // Pull the live radio stream and forward each chunk as a binary frame.
  const response = await fetch(STREAM_URL);
  const reader = response.body.getReader();
  const deadline = Date.now() + RUN_MS;
  while (Date.now() < deadline && ws.readyState === WebSocket.OPEN) {
    const { value, done } = await reader.read();
    if (done) break;
    ws.send(value);
  }
  await reader.cancel();

  if (ws.readyState === WebSocket.OPEN) {
    // Terminate finalizes the open turn.
    // Keep the connection open long enough to receive the last final.
    ws.send(JSON.stringify({ type: "Terminate" }));
  }
});

ws.on("message", (message) => {
  const data = JSON.parse(message);
  if (data.type === "Turn") {
    process.stdout.write(data.end_of_turn ? `${data.transcript}\n` : `\r${data.transcript}`);
  }
});

ws.on("error", (error) => console.error("\nError:", error));
ws.on("close", () => {
  console.log("\nDisconnected.");
  process.exit();
});

Limits

Session length: a streaming session auto-closes after 3 hours.
Audio: mono 16-bit PCM by default; set sample_rate to match your source. These examples use AAC (encoding=aac, ADTS framing). Opus is also accepted (encoding=ogg_opus for Ogg streams, encoding=opus for raw Opus packets). See Sending audio.
Rate limit: new-session rate limits scale automatically with usage (default 5 for free accounts). Check yours on the rate limits page.

Next steps

To learn more about Streaming Speech-to-Text, see the following resources:

Streaming Speech-to-Text overview
Message sequence breakdown — understand the Begin, Turn, and Termination events
WebSocket API reference

Need some help?

If you get stuck, or have any other questions, we’d love to help you out. Contact our support team at support@assemblyai.com or create a support ticket.

Getting started

Features

API reference

Advanced

Integrations

Guides

Overview

Before you begin

Transcribe streaming audio

Step 1: Install the SDK

Step 2: Stream your first session

What you get back

Using the WebSocket API directly

Limits

Next steps

Need some help?

​Overview

​Before you begin

​Transcribe streaming audio

​Step 1: Install the SDK

​Step 2: Stream your first session

​What you get back

​Using the WebSocket API directly

​Limits

​Next steps

​Need some help?

Overview

Before you begin

Transcribe streaming audio

Step 1: Install the SDK

Step 2: Stream your first session

What you get back

Using the WebSocket API directly

Limits

Next steps

Need some help?