Transcribe a pre-recorded audio file
Transcribe a pre-recorded audio file
Transcribe a pre-recorded audio file
This guide walks you through transcribing your first audio file with AssemblyAI. You will learn how to submit an audio file for transcription and retrieve the results using the AssemblyAI API.
When transcribing an audio file, there are three main things you will want to specify:
You must include the speech_models parameter in every transcription request. There is no default model for pre-recorded transcription. If you omit speech_models, the request will fail. See Model selection to learn about available models.
We recommend Universal-3 Pro for pre-recorded audio transcription. It delivers the highest accuracy and fastest transcription out of the box, with optional prompting for when you need more control. For the broadest language coverage (99 languages), use ["universal-3-pro", "universal-2"] to automatically fall back to Universal-2 for unsupported languages.
Before you begin, make sure you have:
requests library (pip install requests)First, configure your API endpoint and authentication:
Replace YOUR_API_KEY with your actual AssemblyAI API key.
Use our EU endpoint by changing base_url to
"https://api.eu.assemblyai.com".
You can transcribe audio files in two ways:
Option A: Use a publicly accessible URL
Option B: Upload a local file
If your audio file is stored locally, upload it to AssemblyAI first:
Create a request with your audio URL and desired configuration options:
This configuration:
universal-3-pro and universal-2 models for broad language coverage. Learn more about our different speech recognition models here.The id field returned from POST /v2/transcript is the transcript ID. Persist it (along with a timestamp and the API region) for every transcription request, not just when you hit an error. The transcript ID is required to fetch results, retry, or delete the transcript later — and it’s the first thing support@assemblyai.com will ask for when troubleshooting a specific request. See Troubleshoot common errors for the full debugging flow.
Pricing can vary based on the speech model used in the request.
If you already have an account with us, you can find your specific pricing on the Billing page of your dashboard. If you are a new customer, you can find general pricing information here.
Transcription happens asynchronously. Poll the API until the transcription is complete:
The polling loop checks the transcription status every 3 seconds and prints the full transcript once processing is complete.
If you enabled speaker labels, you can access the speaker-separated utterances:
Here is the full working code:
Now that you have transcribed your first audio file:
For more information, check out the full API reference documentation.