Transcribe a pre-recorded audio file

Learn how to transcribe and analyze an audio file.

Overview

This guide walks you through transcribing your first audio file with AssemblyAI. You will learn how to submit an audio file for transcription and retrieve the results using the AssemblyAI API.

When transcribing an audio file, there are three main things you will want to specify:

  1. The speech models you would like to use (required).
  2. The region you would like to use (optional).
  3. Other models you would like to use like Speaker Diarization or PII Redaction (optional).

Prerequisites

Before you begin, make sure you have:

  • An AssemblyAI API key (get one by signing up at assemblyai.com)
  • Python 3.6 or later installed
  • The requests library (pip install requests)

Step 1: Set up your API credentials

First, configure your API endpoint and authentication:

1import requests
2import time
3
4base_url = "https://api.assemblyai.com"
5headers = {"authorization": "YOUR_API_KEY"}

Replace YOUR_API_KEY with your actual AssemblyAI API key.

Need EU data residency?

Use our EU endpoint by changing base_url to "https://api.eu.assemblyai.com".

Step 2: Specify your audio source

You can transcribe audio files in two ways:

Option A: Use a publicly accessible URL

1audio_file = "https://assembly.ai/wildfires.mp3"

Option B: Upload a local file

If your audio file is stored locally, upload it to AssemblyAI first:

1with open("./example.mp3", "rb") as f:
2 response = requests.post(base_url + "/v2/upload", headers=headers, data=f)
3
4 if response.status_code != 200:
5 print(f"Error: {response.status_code}, Response: {response.text}")
6 response.raise_for_status()
7
8 upload_json = response.json()
9 audio_file = upload_json["upload_url"]

Step 3: Submit the transcription request

Create a request with your audio URL and desired configuration options:

1data = {
2 "audio_url": audio_file,
3 "speech_models": ["universal-3-pro", "universal-2"],
4 "language_detection": True,
5 "speaker_labels": True
6}
7
8response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)
9
10if response.status_code != 200:
11 print(f"Error: {response.status_code}, Response: {response.text}")
12 response.raise_for_status()
13
14transcript_json = response.json()
15transcript_id = transcript_json["id"]

This configuration:

Model Pricing

Pricing can vary based on the speech model used in the request.

If you already have an account with us, you can find your specific pricing on the Billing page of your dashboard. If you are a new customer, you can find general pricing information here.

Step 4: Poll for the transcription result

Transcription happens asynchronously. Poll the API until the transcription is complete:

1polling_endpoint = f"{base_url}/v2/transcript/{transcript_id}"
2
3while True:
4 transcript = requests.get(polling_endpoint, headers=headers).json()
5
6 if transcript["status"] == "completed":
7 print(f"\nFull Transcript:\n\n{transcript['text']}")
8 break
9 elif transcript["status"] == "error":
10 raise RuntimeError(f"Transcription failed: {transcript['error']}")
11 else:
12 time.sleep(3)

The polling loop checks the transcription status every 3 seconds and prints the full transcript once processing is complete.

Step 5: Access speaker diarization (optional)

If you enabled speaker labels, you can access the speaker-separated utterances:

1for utterance in transcript['utterances']:
2 print(f"Speaker {utterance['speaker']}: {utterance['text']}")

Complete example

Here is the full working code:

1import requests
2import time
3
4base_url = "https://api.assemblyai.com"
5headers = {"authorization": "YOUR_API_KEY"}
6
7# Use a publicly-accessible URL
8audio_file = "https://assembly.ai/wildfires.mp3"
9
10# Or upload a local file:
11# with open("./example.mp3", "rb") as f:
12# response = requests.post(base_url + "/v2/upload", headers=headers, data=f)
13# if response.status_code != 200:
14# print(f"Error: {response.status_code}, Response: {response.text}")
15# response.raise_for_status()
16# upload_json = response.json()
17# audio_file = upload_json["upload_url"]
18
19data = {
20 "audio_url": audio_file,
21 "speech_models": ["universal-3-pro", "universal-2"],
22 "language_detection": True,
23 "speaker_labels": True
24}
25
26response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)
27
28if response.status_code != 200:
29 print(f"Error: {response.status_code}, Response: {response.text}")
30 response.raise_for_status()
31
32transcript_json = response.json()
33transcript_id = transcript_json["id"]
34polling_endpoint = f"{base_url}/v2/transcript/{transcript_id}"
35
36while True:
37 transcript = requests.get(polling_endpoint, headers=headers).json()
38 if transcript["status"] == "completed":
39 print(f"\nFull Transcript:\n\n{transcript['text']}")
40
41 # Optionally print speaker diarization results
42 # for utterance in transcript['utterances']:
43 # print(f"Speaker {utterance['speaker']}: {utterance['text']}")
44 break
45 elif transcript["status"] == "error":
46 raise RuntimeError(f"Transcription failed: {transcript['error']}")
47 else:
48 time.sleep(3)

Next steps

Now that you have transcribed your first audio file:

For more information, check out the full API reference documentation.