For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
PlaygroundChangelogSign In
OverviewAPI ReferencePre-recorded STTStreaming STTVoice AgentsSpeech UnderstandingGuardrailsLLM GatewayFAQ
OverviewAPI ReferencePre-recorded STTStreaming STTVoice AgentsSpeech UnderstandingGuardrailsLLM GatewayFAQ
  • Getting started
    • Transcribe a pre-recorded audio file
    • Model selection
    • View model benchmarks
    • Evaluate model accuracy
    • Cloud endpoints & data residency
    • Manage concurrent requests
    • Webhooks
  • Models
    • Medical Mode
  • Features
    • Boost specific terms
    • Label speakers
    • Transcribe multiple audio channels
    • Transcribe audio with mixed languages
    • Correct spelling of terms
    • Include filler words
    • Search for words in transcript
    • Set the start and end of the transcript
  • Guides
LogoLogo
PlaygroundChangelogSign In
On this page
  • Overview
  • Prerequisites
  • Step 1: Set up your API credentials
  • Step 2: Specify your audio source
  • Step 3: Submit the transcription request
  • Step 4: Poll for the transcription result
  • Step 5: Access speaker diarization (optional)
  • Complete example
  • Next steps
Getting started

Transcribe a pre-recorded audio file

Learn how to transcribe and analyze an audio file.
Was this page helpful?
Built with

Overview

This guide walks you through transcribing your first audio file with AssemblyAI. You will learn how to submit an audio file for transcription and retrieve the results using the AssemblyAI API.

When transcribing an audio file, there are three main things you will want to specify:

  1. The speech models you would like to use (required).
  2. The region you would like to use (optional).
  3. Other models you would like to use like Speaker Diarization or PII Redaction (optional).
speech_models is required

You must include the speech_models parameter in every transcription request. There is no default model for pre-recorded transcription. If you omit speech_models, the request will fail. See Model selection to learn about available models.

Recommended model

We recommend Universal-3 Pro for pre-recorded audio transcription. It delivers the highest accuracy and fastest transcription out of the box, with optional prompting for when you need more control. For the broadest language coverage (99 languages), use ["universal-3-pro", "universal-2"] to automatically fall back to Universal-2 for unsupported languages.

Prerequisites

Before you begin, make sure you have:

Python
Python SDK
JavaScript
JavaScript SDK
  • An AssemblyAI API key (get one by signing up at assemblyai.com)
  • Python 3.6 or later installed
  • The requests library (pip install requests)

Step 1: Set up your API credentials

First, configure your API endpoint and authentication:

Python
Python SDK
JavaScript
JavaScript SDK
1import requests
2import time
3
4base_url = "https://api.assemblyai.com"
5headers = {"authorization": "YOUR_API_KEY"}

Replace YOUR_API_KEY with your actual AssemblyAI API key.

Need EU data residency?

Use our EU endpoint by changing base_url to "https://api.eu.assemblyai.com".

Step 2: Specify your audio source

You can transcribe audio files in two ways:

Python
Python SDK
JavaScript
JavaScript SDK

Option A: Use a publicly accessible URL

1audio_file = "https://assembly.ai/wildfires.mp3"

Option B: Upload a local file

If your audio file is stored locally, upload it to AssemblyAI first:

1with open("./example.mp3", "rb") as f:
2 response = requests.post(base_url + "/v2/upload", headers=headers, data=f)
3
4 if response.status_code != 200:
5 print(f"Error: {response.status_code}, Response: {response.text}")
6 response.raise_for_status()
7
8 upload_json = response.json()
9 audio_file = upload_json["upload_url"]

Step 3: Submit the transcription request

Create a request with your audio URL and desired configuration options:

Python
Python SDK
JavaScript
JavaScript SDK
1data = {
2 "audio_url": audio_file,
3 "speech_models": ["universal-3-pro", "universal-2"],
4 "language_detection": True,
5 "speaker_labels": True
6}
7
8response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)
9
10if response.status_code != 200:
11 print(f"Error: {response.status_code}, Response: {response.text}")
12 response.raise_for_status()
13
14transcript_json = response.json()
15transcript_id = transcript_json["id"]

This configuration:

  • Uses both the universal-3-pro and universal-2 models for broad language coverage. Learn more about our different speech recognition models here.
  • Uses our Automatic Language Detection model to detect the dominant language in the spoken audio.
  • Uses our Speaker Diarization model to create turn-by-turn utterances.
Log the transcript ID for every request

The id field returned from POST /v2/transcript is the transcript ID. Persist it (along with a timestamp and the API region) for every transcription request, not just when you hit an error. The transcript ID is required to fetch results, retry, or delete the transcript later — and it’s the first thing support@assemblyai.com will ask for when troubleshooting a specific request. See Troubleshoot common errors for the full debugging flow.

Model Pricing

Pricing can vary based on the speech model used in the request.

If you already have an account with us, you can find your specific pricing on the Billing page of your dashboard. If you are a new customer, you can find general pricing information here.

Step 4: Poll for the transcription result

Transcription happens asynchronously. Poll the API until the transcription is complete:

Python
Python SDK
JavaScript
JavaScript SDK
1polling_endpoint = f"{base_url}/v2/transcript/{transcript_id}"
2
3while True:
4 transcript = requests.get(polling_endpoint, headers=headers).json()
5
6 if transcript["status"] == "completed":
7 print(f"\nFull Transcript:\n\n{transcript['text']}")
8 break
9 elif transcript["status"] == "error":
10 raise RuntimeError(f"Transcription failed: {transcript['error']}")
11 else:
12 time.sleep(3)

The polling loop checks the transcription status every 3 seconds and prints the full transcript once processing is complete.

Step 5: Access speaker diarization (optional)

If you enabled speaker labels, you can access the speaker-separated utterances:

Python
Python SDK
JavaScript
JavaScript SDK
1for utterance in transcript['utterances']:
2 print(f"Speaker {utterance['speaker']}: {utterance['text']}")

Complete example

Here is the full working code:

Python
Python SDK
JavaScript
JavaScript SDK
1import requests
2import time
3
4base_url = "https://api.assemblyai.com"
5headers = {"authorization": "YOUR_API_KEY"}
6
7# Use a publicly-accessible URL
8audio_file = "https://assembly.ai/wildfires.mp3"
9
10# Or upload a local file:
11# with open("./example.mp3", "rb") as f:
12# response = requests.post(base_url + "/v2/upload", headers=headers, data=f)
13# if response.status_code != 200:
14# print(f"Error: {response.status_code}, Response: {response.text}")
15# response.raise_for_status()
16# upload_json = response.json()
17# audio_file = upload_json["upload_url"]
18
19data = {
20 "audio_url": audio_file,
21 "speech_models": ["universal-3-pro", "universal-2"],
22 "language_detection": True,
23 "speaker_labels": True
24}
25
26response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)
27
28if response.status_code != 200:
29 print(f"Error: {response.status_code}, Response: {response.text}")
30 response.raise_for_status()
31
32transcript_json = response.json()
33transcript_id = transcript_json["id"]
34polling_endpoint = f"{base_url}/v2/transcript/{transcript_id}"
35
36while True:
37 transcript = requests.get(polling_endpoint, headers=headers).json()
38 if transcript["status"] == "completed":
39 print(f"\nFull Transcript:\n\n{transcript['text']}")
40
41 # Optionally print speaker diarization results
42 # for utterance in transcript['utterances']:
43 # print(f"Speaker {utterance['speaker']}: {utterance['text']}")
44 break
45 elif transcript["status"] == "error":
46 raise RuntimeError(f"Transcription failed: {transcript['error']}")
47 else:
48 time.sleep(3)

Next steps

Now that you have transcribed your first audio file:

  • Learn how you can do even more with Universal-3 Pro with prompting
  • Explore our Speech Understanding features for more ways to analyze your audio data
  • Learn more about searching, summarizing, or asking questions on your transcript with our LLM Gateway feature
  • Find out how to use webhooks to get notified when your transcripts are ready

For more information, check out the full API reference documentation.