Universal-3 Pro (Async) | AssemblyAI

Universal-3 Pro is our most powerful Voice AI model, designed to capture the “hard stuff” that traditional ASR models struggle with. It delivers state-of-the-art accuracy for entities, rare words, and domain-specific terminology out of the box, with code switching and optional prompting for more control. It’s also our fastest model, so you get the best accuracy without sacrificing speed.

Quickstart

Get started with Universal-3 Pro using the code below. This example transcribes a pre-recorded audio file and prints the transcript text to your terminal.

Python

Python SDK

JavaScript

JavaScript SDK

Install the required library

$ pip install requests

Create a new file main.py and paste the code below. Replace <YOUR_API_KEY> with your API key.

Run with python main.py.

1 import requests
2 import time
3 
4 base_url = "https://api.assemblyai.com"
5 headers = {"authorization": "<YOUR_API_KEY>"}
6 
7 data = {
8     "audio_url": "https://assembly.ai/sports_injuries.mp3",
9     "language_detection": True,
10     "speech_models": ["universal-3-pro", "universal-2"]
11 }
12 
13 response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)
14 
15 if response.status_code != 200:
16     print(f"Error: {response.status_code}, Response: {response.text}")
17     response.raise_for_status()
18 
19 transcript_response = response.json()
20 transcript_id = transcript_response["id"]
21 polling_endpoint = f"{base_url}/v2/transcript/{transcript_id}"
22 
23 while True:
24     transcript = requests.get(polling_endpoint, headers=headers).json()
25     if transcript["status"] == "completed":
26         print(transcript["text"])
27         break
28     elif transcript["status"] == "error":
29         raise RuntimeError(f"Transcription failed: {transcript['error']}")
30     else:
31         time.sleep(3)

Language support

Universal-3 Pro supports English, Spanish, Portuguese, French, German, and Italian. To access all 99 languages, use "speech_models": ["universal-3-pro", "universal-2"] as shown in the code example. Read more here.

Key capabilities

The model out of the box outperforms all ASR models on the market on accuracy, especially as it pertains to entities and rare words. With prompting, you can get an entirely customized transcription output that rivals near-human-level transcription.

Keyterm Prompting: Improve recognition of domain-specific terminology, rare words, and proper nouns
Prompting: Guide transcription style, formatting, and output characteristics

What prompts can do	Description
Verbatim transcription and disfluencies	Include um, uh, false starts, repetitions, stutters
Output style and formatting	Control punctuation, capitalization, number formatting
Context aware clues	Help with jargon, names, and domain expectations
Entity accuracy and spelling	Improve accuracy for proper nouns, brands, technical terms
Native code switching	Handle multilingual audio in same transcript
Regional dialect recognition	Accurately transcribe regional dialects like Quebecois French, Brazilian Portuguese, Spanglish, and more. See supported dialects
Numbers and measurements	Control how numbers, percentages, and measurements are formatted

To fine-tune to your use case, see the Prompting section. Not sure where to start? Use one of the recommended prompts and tweak from there.

Start with no prompt

We strongly recommend testing with no prompt first. When you omit the prompt parameter, Universal-3 Pro automatically applies a built-in default prompt that is already optimized for accuracy across a wide range of audio types — including verbatim transcription, multilingual code-switching, and challenging audio conditions. For most use cases, the default prompt delivers excellent results out of the box.

If you’re going to build a prompt, start with one of the recommended prompts and then tweak it for your use case. You should not start from scratch with your prompt — use a recommended prompt and then build off of it. Please read the Prompting Guide (Async) if you’d like to build your prompt yourself.

Remember, prompts are primarily instructional, so adding a large amount of context may not make a significant impact on accuracy and could reduce instruction-following coherence. Feel free to layer in additional instructions that you see in the Prompting Guide (Async).

Benchmarking

A note on evaluating modern speech-to-text

Across the industry, we’re seeing that as models improve, they sometimes capture words or phrases that human transcribers originally missed. In WER evaluations, this shows up as insertions, even when the model is technically correct. We’ve also seen substitutions impact scores in cases where formatting differs (e.g., “alright” vs. “all right”), despite no meaningful accuracy difference.

To help address this, we’re actively developing documentation, blog content, and benchmarking tools focused on best practices for evaluating modern speech-to-text systems. We’ll continue sharing these resources as they’re released.

This is increasingly becoming an industry-wide benchmarking challenge as models begin to match, or exceed, human transcription quality in certain scenarios.

For more details on evaluating transcription accuracy, including tips on using semantic WER and handling substitution artifacts, see Evaluating your prompts in the Prompting Guide.

Keyterms prompting

Keyterms prompting allows you to provide up to 1,000 words or phrases (maximum 6 words per phrase) using the keyterms_prompt parameter to improve transcription accuracy for those terms and related variations or contextually similar phrases.

When to use keyterms vs. prompt

If you already know the specific names, brands, drug names, acronyms, or jargon that will appear in your audio, reach for keyterms_prompt — it is optimized for term boosting and produces more reliable results than describing the same terms in a free-form prompt. Use the prompt parameter when you want to control transcription style or behavior (disfluencies, formatting, code switching). The two parameters cannot be used in the same request.

Here is an example showing how you can use keyterms prompting to improve transcription accuracy for a name with distinctive spelling and formatting.

Without keyterms prompting:

Hi, this is Kelly Byrne Donahue

With keyterms prompting:

Hi, this is Kelly Byrne-Donoghue

Python

Python SDK

JavaScript

JavaScript SDK

1 import requests
2 import time
3 
4 base_url = "https://api.assemblyai.com"
5 headers = {"authorization": "<YOUR_API_KEY>"}
6 
7 data = {
8     "audio_url": "https://assemblyaiassets.com/audios/keyterms_prompting.wav",
9     "language_detection": True,
10     "speech_models": ["universal-3-pro", "universal-2"],
11     "keyterms_prompt": ["Kelly Byrne-Donoghue"]
12 }
13 
14 response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)
15 
16 if response.status_code != 200:
17     print(f"Error: {response.status_code}, Response: {response.text}")
18     response.raise_for_status()
19 
20 transcript_response = response.json()
21 transcript_id = transcript_response["id"]
22 polling_endpoint = f"{base_url}/v2/transcript/{transcript_id}"
23 
24 while True:
25     transcript = requests.get(polling_endpoint, headers=headers).json()
26     if transcript["status"] == "completed":
27         print(transcript["text"])
28         break
29     elif transcript["status"] == "error":
30         raise RuntimeError(f"Transcription failed: {transcript['error']}")
31     else:
32         time.sleep(3)

Remove audio tags

Universal-3 Pro generates rich transcripts that can include inline annotations such as audio event markers (e.g., [laughter], [music]) and speaker cues. If your workflow requires clean, undecorated text, set remove_audio_tags to "all" to strip all inline annotations from the transcript output.

This is especially useful for:

Pipelines that parse transcript text downstream (NLP, search indexing, LLM input)
Display contexts where annotations would confuse end users
Any workflow that expects plain text output

Python

Python SDK

JavaScript

JavaScript SDK

1 import requests
2 import time
3 
4 base_url = "https://api.assemblyai.com"
5 headers = {"authorization": "<YOUR_API_KEY>"}
6 
7 data = {
8     "audio_url": "https://assembly.ai/sports_injuries.mp3",
9     "language_detection": True,
10     "speech_models": ["universal-3-pro", "universal-2"],
11     "remove_audio_tags": "all"
12 }
13 
14 response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)
15 
16 if response.status_code != 200:
17     print(f"Error: {response.status_code}, Response: {response.text}")
18     response.raise_for_status()
19 
20 transcript_response = response.json()
21 transcript_id = transcript_response["id"]
22 polling_endpoint = f"{base_url}/v2/transcript/{transcript_id}"
23 
24 while True:
25     transcript = requests.get(polling_endpoint, headers=headers).json()
26     if transcript["status"] == "completed":
27         print(transcript["text"])
28         break
29     elif transcript["status"] == "error":
30         raise RuntimeError(f"Transcription failed: {transcript['error']}")
31     else:
32         time.sleep(3)

This parameter is only supported for Universal-3 Pro.

Prompting

For a comprehensive guide on crafting effective prompts, including best practices, prompt capabilities, and example prompts, see the Prompting guide.

Universal-3 Pro delivers great accuracy out of the box. To fine-tune transcription results to your use case, provide a prompt with up to 1,500 words of context in plain language. This helps the model consistently recognize domain-specific terminology, apply your preferred formatting conventions, handle code switching between languages, and better interpret ambiguous speech.

When to use prompt vs. keyterms

Use prompt to control transcription style or behavior (disfluencies, formatting, code switching). If you already know the specific names, brands, drug names, acronyms, or jargon that will appear in your audio, use keyterms_prompt instead — it is optimized for term boosting and produces more reliable results than describing the same terms in plain language. The two parameters cannot be used in the same request.

The following is our recommended prompt for verbatim multi-lingual transcription:

Python

Python SDK

JavaScript

JavaScript SDK

1 import requests
2 import time
3 
4 base_url = "https://api.assemblyai.com"
5 headers = {"authorization": "<YOUR_API_KEY>"}
6 
7 data = {
8     "audio_url": "https://assembly.ai/sports_injuries.mp3",
9     "language_detection": True,
10     "speech_models": ["universal-3-pro", "universal-2"],
11     "prompt": "Required: Preserve the original language(s) and script as spoken, "
12               "including code-switching and mixed-language phrases.\n\n"
13               "Mandatory: Preserve linguistic speech patterns including disfluencies, "
14               "filler words, hesitations, repetitions, stutters, false starts, "
15               "and colloquialisms in the spoken language.\n\n"
16               "Always: Transcribe speech with your best guess based on context in "
17               "all possible scenarios where speech is present in the audio."
18 }
19 
20 response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)
21 
22 if response.status_code != 200:
23     print(f"Error: {response.status_code}, Response: {response.text}")
24     response.raise_for_status()
25 
26 transcript_response = response.json()
27 transcript_id = transcript_response["id"]
28 polling_endpoint = f"{base_url}/v2/transcript/{transcript_id}"
29 
30 while True:
31     transcript = requests.get(polling_endpoint, headers=headers).json()
32     if transcript["status"] == "completed":
33         print(transcript["text"])
34         break
35     elif transcript["status"] == "error":
36         raise RuntimeError(f"Transcription failed: {transcript['error']}")
37     else:
38         time.sleep(3)

Default prompt

When no prompt is provided, Universal-3 Pro automatically applies the following default prompt:

Transcribe with context and proper nouns preserved, where speech is
present in the audio. Each language as spoken. English as English.
Non-native speakers.

You can override the default prompt by providing your own prompt value. See the Prompting guide for detailed examples covering verbatim transcription, output formatting, entity accuracy, code switching, and more.

Prior system prompt (April 15, 2026 – April 21, 2026)

The previous built-in system prompt was:

Always: Transcribe code-switching speech with your best guess based on
context in all possible scenarios where speech is present in the audio.
Languages: English, Spanish, German, French, Portuguese, Italian.
Language codes: en, es, de, fr, pt, it.

Prior system prompt (February 25, 2026 – April 15, 2026)

The previous built-in system prompt was:

Always: Transcribe speech with your best guess based on context in all
possible scenarios where speech is present in the audio.

Prior system prompt (February 20, 2026 – February 25, 2026)

The previous built-in system prompt was:

Required: Preserve the original language(s) and script as spoken,
including code-switching and mixed-language phrases.
Mandatory: Preserve linguistic speech patterns including disfluencies,
filler words, hesitations, repetitions, stutters, false starts, and
colloquialisms in the spoken language.
Always: Transcribe speech with your best guess based on context in all
possible scenarios where speech is present in the audio.

Prior system prompt (before February 20, 2026)

The previous built-in system prompt was:

Transcribe this audio

Best practices for prompt engineering

See the Prompting guide for recommended prompts, capability “knobs” you can turn, and guidance on evaluating prompts against your own audio.

Support for 99 languages

With the speech_models parameter, you can list multiple speech models in priority order, allowing our system to automatically route your audio based on language support.

Model routing behavior: The system attempts to use the models in priority order falling back to the next model when needed. For example, with ["universal-3-pro", "universal-2"], the system will try to use universal-3-pro for languages it supports (English, Spanish, Portuguese, French, German, and Italian), and automatically fall back to Universal-2 for all other languages. This ensures you get the best performing transcription where available while maintaining the widest language coverage.