Quickstart - AssemblyAI

Overview

By the end of this guide, you’ll have a working script that transcribes an audio file in a single SDK call. Build it with an AI coding agent, or write it yourself — both are below. Prefer to try it first? Transcribe audio without writing any code in the AssemblyAI Playground.

Before you begin

You’ll need:

An API key — grab one from your dashboard. Every example below reads it from an environment variable, so set it once:
export ASSEMBLYAI_API_KEY=<your-key>
Python 3.8+ or Node.js 18+, depending on which SDK you use.

Building with an AI coding agent? Wire it up to AssemblyAI’s live docs (MCP server) and the AssemblyAI skill so it writes correct, up-to-date code instead of relying on stale training data:

claude mcp add --transport http --scope user assemblyai-docs https://assemblyai.com/docs/mcp
npx skills add AssemblyAI/assemblyai-skill --global

Then describe what you want to build. To get the same result as the steps below, paste:

Use the AssemblyAI Python SDK to transcribe https://assembly.ai/wildfires.mp3 and print the transcript text.

Transcribe your first file

Prefer to write it yourself? Follow these steps to transcribe our hosted sample file. The SDK uploads, submits, and polls for you in a single call.

Step 1: Install the SDK

Python SDK
JavaScript SDK

pip install assemblyai

npm install assemblyai

Step 2: Run your first transcription

Save this as transcribe.py (Python) or transcribe.js (JavaScript):

Python SDK
JavaScript SDK

import os
import assemblyai as aai

aai.settings.api_key = os.environ["ASSEMBLYAI_API_KEY"]

transcript = aai.Transcriber().transcribe("https://assembly.ai/wildfires.mp3")
print(transcript.text)

import { AssemblyAI } from "assemblyai";

const client = new AssemblyAI({ apiKey: process.env.ASSEMBLYAI_API_KEY });

const transcript = await client.transcripts.transcribe({
  audio: "https://assembly.ai/wildfires.mp3",
});
console.log(transcript.text);

Then run it — python transcribe.py or node transcribe.js. You’ll see the transcript printed:

Smoke from hundreds of wildfires in Canada is triggering air quality alerts throughout the US...

That’s the whole first call. From here you can add options — speaker labels, language detection, or a local file — see the complete example to combine them, or use the HTTP API directly if you’re not using an SDK.

Customize your request

The call above works with no extra configuration. Add capabilities by setting options on the same request — combine as many as you need (the complete example sets several at once).

Transcribe a local file

Pass a file path instead of a URL; the SDK uploads it for you.

Python SDK
JavaScript SDK

transcript = aai.Transcriber().transcribe("./example.mp3")

const transcript = await client.transcripts.transcribe({
  audio: "./example.mp3",
});

Identify speakers

Enable Speaker Diarization to split the transcript by speaker. Each labeled segment (an utterance) has a speaker ID and its text.

Python SDK
JavaScript SDK

config = aai.TranscriptionConfig(speaker_labels=True)
transcript = aai.Transcriber().transcribe("https://assembly.ai/wildfires.mp3", config=config)

for utterance in transcript.utterances:
    print(f"Speaker {utterance.speaker}: {utterance.text}")

const transcript = await client.transcripts.transcribe({
  audio: "https://assembly.ai/wildfires.mp3",
  speaker_labels: true,
});

for (const utterance of transcript.utterances) {
  console.log(`Speaker ${utterance.speaker}: ${utterance.text}`);
}

Detect the language automatically

Use Automatic Language Detection to detect the dominant spoken language. The language_detection=True option is used in the complete example below.

Complete example

Here’s the complete, runnable script — the call above plus options and error handling:

Python SDK
JavaScript SDK

import os
import assemblyai as aai

aai.settings.base_url = "https://api.assemblyai.com"
aai.settings.api_key = os.environ["ASSEMBLYAI_API_KEY"]

# Use a publicly-accessible URL
audio_file = "https://assembly.ai/wildfires.mp3"

# Or use a local file:
# audio_file = "./example.mp3"

config = aai.TranscriptionConfig(
    language_detection=True,
    speaker_labels=True,
)

transcript = aai.Transcriber().transcribe(audio_file, config=config)

if transcript.status == aai.TranscriptStatus.error:
    raise RuntimeError(f"Transcription failed: {transcript.error}")

# Log transcript.id for every request (not just errors), with a timestamp and API region.
# It's required to fetch results, retry, or delete the transcript later, and it's the first
# thing support@assemblyai.com asks for. Delete: /pre-recorded-audio/delete-transcripts
# Troubleshooting: /pre-recorded-audio/guides/common_errors_and_solutions

print(f"\nFull Transcript:\n\n{transcript.text}")

# Optionally print speaker diarization results
# for utterance in transcript.utterances:
#     print(f"Speaker {utterance.speaker}: {utterance.text}")

import { AssemblyAI } from "assemblyai";

const baseUrl = "https://api.assemblyai.com";

const client = new AssemblyAI({
  apiKey: process.env.ASSEMBLYAI_API_KEY,
  baseUrl: baseUrl,
});

// Use a publicly-accessible URL
const audioFile = "https://assembly.ai/wildfires.mp3";

// Or use a local file:
// const audioFile = "./example.mp3";

const params = {
  audio: audioFile,
  language_detection: true,
  speaker_labels: true,
};

const run = async () => {
  const transcript = await client.transcripts.transcribe(params);

  if (transcript.status === "error") {
    throw new Error(`Transcription failed: ${transcript.error}`);
  }

  // Log transcript.id for every request (not just errors), with a timestamp and API region.
  // It's required to fetch results, retry, or delete the transcript later, and it's the first
  // thing support@assemblyai.com asks for. Delete: /pre-recorded-audio/delete-transcripts
  // Troubleshooting: /pre-recorded-audio/guides/common_errors_and_solutions

  console.log(`\nFull Transcript:\n\n${transcript.text}`);

  // Optionally print speaker diarization results
  // for (const utterance of transcript.utterances) {
  //   console.log(`Speaker ${utterance.speaker}: ${utterance.text}`);
  // }
};

run();

What you get back

A completed transcript includes the full text plus metadata, and per-speaker utterances when you enable speaker_labels. The SDK exposes these as attributes (transcript.text, transcript.utterances[0].speaker); the raw API returns the same fields as JSON:

{
  "id": "106993b6-ac12-45d0-b74a-1bbd923e755d",
  "status": "completed",
  "text": "Smoke from hundreds of wildfires in Canada is triggering air quality alerts...",
  "language_code": "en",
  "audio_duration": 282,
  "confidence": 0.95,
  "utterances": [
    {
      "speaker": "A",
      "text": "Smoke from hundreds of wildfires in Canada is triggering air quality alerts...",
      "confidence": 0.97,
      "start": 100,
      "end": 26560,
      "words": [
        { "text": "Smoke", "start": 100, "end": 640, "confidence": 0.9, "speaker": "A" }
      ]
    }
  ]
}

start and end are in milliseconds. Persist id to fetch, retry, or delete the transcript later. See the transcript API reference for the complete field list.

Using the HTTP API directly

Not using an SDK? The same flow works over plain HTTP — authenticate with your key in the authorization header (no Bearer prefix), submit to POST /v2/transcript, then poll (repeatedly call GET /v2/transcript/{id}) until the status is completed. The SDKs above do all of this for you, including uploading local files and polling. All three examples read your key from the same ASSEMBLYAI_API_KEY environment variable you set in Before you begin. The cURL example also needs jq (brew install jq); the Python example needs the requests library (pip install requests); the JavaScript example needs Node.js 18+ (built-in fetch).

cURL
Python
JavaScript

Submit the file, poll until the status is completed, then print the text. (The variable is named state because zsh reserves status.)

id=$(curl -s -X POST https://api.assemblyai.com/v2/transcript \
  -H "authorization: $ASSEMBLYAI_API_KEY" \
  -H "content-type: application/json" \
  -d '{
    "audio_url": "https://assembly.ai/wildfires.mp3",
    "language_detection": true,
    "speaker_labels": true
  }' | jq -r .id)

while true; do
  state=$(curl -s https://api.assemblyai.com/v2/transcript/$id \
    -H "authorization: $ASSEMBLYAI_API_KEY" | jq -r .status)
  [ "$state" = "completed" ] && break
  [ "$state" = "error" ] && { echo "Transcription failed"; break; }
  sleep 3
done

curl -s https://api.assemblyai.com/v2/transcript/$id \
  -H "authorization: $ASSEMBLYAI_API_KEY" | jq -r .text

To transcribe a local file, upload it first and use the returned upload_url as the audio_url:

curl -s -X POST https://api.assemblyai.com/v2/upload \
  -H "authorization: $ASSEMBLYAI_API_KEY" \
  --data-binary @./example.mp3 | jq -r .upload_url

The file must be streamed as raw bytes with curl --data-binary @<file> (note the @). Using -d/--data, or passing a JSON body or a file-path string, will return a successful upload_url but then fail downstream at transcription with a Transcoding failed. File type application/json or text/plain error. See Troubleshoot Common Errors for details.

import os
import requests
import time

base_url = "https://api.assemblyai.com"
headers = {"authorization": os.environ["ASSEMBLYAI_API_KEY"]}

# Use a publicly-accessible URL
audio_file = "https://assembly.ai/wildfires.mp3"

# Or upload a local file:
# with open("./example.mp3", "rb") as f:
#     response = requests.post(base_url + "/v2/upload", headers=headers, data=f)
#     if response.status_code != 200:
#         print(f"Error: {response.status_code}, Response: {response.text}")
#         response.raise_for_status()
#     upload_json = response.json()
#     audio_file = upload_json["upload_url"]

data = {
    "audio_url": audio_file,
    "language_detection": True,
    "speaker_labels": True
}

response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)

if response.status_code != 200:
    print(f"Error: {response.status_code}, Response: {response.text}")
    response.raise_for_status()

transcript_json = response.json()
transcript_id = transcript_json["id"]
polling_endpoint = f"{base_url}/v2/transcript/{transcript_id}"

while True:
    transcript = requests.get(polling_endpoint, headers=headers).json()
    if transcript["status"] == "completed":
        print(f"\nFull Transcript:\n\n{transcript['text']}")

        # Optionally print speaker diarization results
        # for utterance in transcript['utterances']:
        #     print(f"Speaker {utterance['speaker']}: {utterance['text']}")
        break
    elif transcript["status"] == "error":
        raise RuntimeError(f"Transcription failed: {transcript['error']}")
    else:
        time.sleep(3)

const baseUrl = "https://api.assemblyai.com";

const headers = {
  authorization: process.env.ASSEMBLYAI_API_KEY,
};

async function transcribe() {
  // Use a publicly-accessible URL
  const audioFile = "https://assembly.ai/wildfires.mp3";

  // Or upload a local file:
  // import fs from "fs-extra";
  // const audioData = await fs.readFile("./example.mp3");
  // const uploadRes = await fetch(`${baseUrl}/v2/upload`, {
  //   method: "POST",
  //   headers,
  //   body: audioData,
  // });
  // if (!uploadRes.ok) throw new Error(`Error: ${uploadRes.status}`);
  // const uploadResponse = await uploadRes.json();
  // const audioFile = uploadResponse.upload_url;

  const data = {
    audio_url: audioFile,
    language_detection: true,
    speaker_labels: true,
  };

  let res = await fetch(`${baseUrl}/v2/transcript`, {
    method: "POST",
    headers: { ...headers, "Content-Type": "application/json" },
    body: JSON.stringify(data),
  });
  if (!res.ok) throw new Error(`Error: ${res.status}`);
  const transcriptResponse = await res.json();
  const transcriptId = transcriptResponse.id;
  const pollingEndpoint = `${baseUrl}/v2/transcript/${transcriptId}`;

  while (true) {
    res = await fetch(pollingEndpoint, { headers });
    if (!res.ok) throw new Error(`Error: ${res.status}`);
    const transcript = await res.json();

    if (transcript.status === "completed") {
      console.log(`\nFull Transcript:\n\n${transcript.text}`);

      // Optionally print speaker diarization results
      // for (const utterance of transcript.utterances) {
      //   console.log(`Speaker ${utterance.speaker}: ${utterance.text}`);
      // }
      break;
    } else if (transcript.status === "error") {
      throw new Error(`Transcription failed: ${transcript.error}`);
    } else {
      await new Promise((resolve) => setTimeout(resolve, 3000));
    }
  }
}

transcribe();

Limits

File size: up to 5 GB per request (/v2/transcript); local files uploaded via /v2/upload up to 2.2 GB.
Duration: 160 ms to 10 hours per file.
Formats: most common audio and video formats — submit your file as-is, no transcoding needed.
Rate limit: default 5 parallel jobs on free accounts, 200 on paid. Check yours on the rate limits page.

Next steps

Now that you have transcribed your first audio file:

Explore our Speech Understanding features for more ways to analyze your audio data
Learn more about searching, summarizing, or asking questions on your transcript with our LLM Gateway feature
Find out how to use webhooks to get notified when your transcripts are ready

For more information, check out the full API reference documentation.

Need some help?

If you get stuck, or have any other questions, we’d love to help you out. Contact our support team at support@assemblyai.com or create a support ticket.

​Overview

​Before you begin

​Transcribe your first file

​Step 1: Install the SDK

​Step 2: Run your first transcription

​Customize your request

​Transcribe a local file

​Identify speakers

​Detect the language automatically

​Complete example

​What you get back

​Using the HTTP API directly

​Limits

​Next steps

​Need some help?

Overview

Before you begin

Transcribe your first file

Step 1: Install the SDK

Step 2: Run your first transcription

Customize your request

Transcribe a local file

Identify speakers

Detect the language automatically

Complete example

What you get back

Using the HTTP API directly

Limits

Next steps

Need some help?