Models

AssemblyAI offers several state-of-the-art speech recognition models, each optimized for different use cases. Choose the model that best fits your needs based on accuracy, latency, cost, and language requirements.

Pre-recorded models

Universal-3.5 Pro

Highest accuracy, fastest model
Supports 18 languages
Native code switching
Contextual prompting capabilities
Keyterms prompting up to 1,000 words

Universal-2

High accuracy, low latency
Support across 99 languages
Keyterms prompting up to 200 words
Code switching

We recommend Universal-3.5 Pro for pre-recorded audio transcription. It delivers the highest accuracy and fastest transcription out of the box, with optional contextual prompting support. Universal-3.5 Pro supports 18 languages, for anything outside that set, the system automatically falls back to Universal-2, giving you coverage across 99 languages total without any extra configuration.

Streaming models

Universal-3.5 Pro Streaming

Highest accuracy for voice agents
Fastest word emissions
Advanced prompting capabilities
Keyterms prompting up to 100 words
18 languages with native code switching

Universal-Streaming Multilingual

Good balance of speed and cost-effectiveness
Multilingual real-time transcription
Keyterms prompting up to 100 words
6 languages: en, es, pt, de, fr, it

Universal-Streaming English

Good balance of speed and cost-effectiveness
English transcription
Keyterms prompting up to 100 words
Intelligent endpointing

We recommend Universal-3.5 Pro Streaming for streaming transcription. It provides the highest accuracy with sub-300ms latency, native multilingual code switching, and advanced prompting support.

Add-on models

Add-on models enhance transcription accuracy for specialized domains. They work alongside your chosen speech model and are billed separately.

Medical Mode

Improved accuracy for medical terminology
Medications, procedures, conditions, and dosages
Works with pre-recorded and streaming models
4 languages: en, es, de, fr

Medical Mode

Medical Mode (domain: "medical-v1") is an add-on that enhances transcription accuracy for medical terminology — including medication names, procedures, conditions, and dosages. It is optimized for medical entity recognition to correct terms that other models frequently get wrong. Supported models:

Pre-recorded: Universal-3.5 Pro, Universal-2
Streaming: Universal-3.5 Pro Streaming, Universal-Streaming English, Universal-Streaming Multilingual

Supported languages: English, Spanish, German, French

Medical Mode is billed as a separate add-on. See the pricing page for details.

Learn more: Medical Mode for pre-recorded audio | Medical Mode for streaming

Choosing the right model

Pre-recorded

Universal-3.5 Pro

Universal-3.5 Pro is our most powerful Voice AI model, designed to capture the “hard stuff” that traditional ASR models struggle with. It delivers state-of-the-art accuracy for entities, rare words, and domain-specific terminology out of the box, with code switching and optional prompting for more control. It’s also our fastest model, so you get the best accuracy without sacrificing speed. Best for:

Applications requiring highest-accuracy transcription
Medical scribes needing clinical grade transcription accuracy
Sales intelligence / Call centers needing native code-switching
Meeting notetakers / recruiting notetakers needing high-quality diarization

Supported languages

Regional dialectsUniversal-3.5 Pro also supports regional dialects and local speech variants out of the box — no special configuration needed. See the full list of supported dialects.

Try Universal-3.5 Pro here

Universal-2

Universal-2 offers accurate, cost-effective transcription across 99 languages with low latency. It supports code switching and optional keyterms prompting for domain-specific vocabulary (up to 200 words). Universal-2 is the go-to choice when you need reliable transcription across diverse languages. Best for:

High accuracy at lower cost with broad language support
High-volume, price-sensitive batch transcription
Support for over 99 languages
Recommended fallback when a requested language isn’t supported by Universal-3.5 Pro

Supported languages

Try Universal-2 here

Streaming

Universal-3.5 Pro Streaming

The most accurate model with the fastest word emissions for voice agents that demand the highest quality. Best-in-class accuracy with advanced prompting capabilities, including both keyterms prompting and native prompting. Supports English, Spanish, German, French, Portuguese, Italian, Turkish, Dutch, Swedish, Norwegian, Danish, Finnish, Hindi, Vietnamese, Arabic, Hebrew, Japanese, and Mandarin. Best for:

Real-time voice agents
Applications requiring premium accuracy
Customer service voice agents needing elite entity accuracy
IVR replacement / binary response detection in short utterances
Agent assist and sales intelligence needing real-time speaker diarization, mid-session dynamic prompting
Multilingual voice agents with native code-switching across 18 languages
Compliance and verbatim recording — disfluency control via prompting

Supported languages

Regional dialectsUniversal-3.5 Pro Streaming also supports regional dialects and local speech variants out of the box, with no special configuration needed. See the full list of supported dialects.

Learn more about Universal-3.5 Pro Streaming

Universal-Streaming Multilingual

A multilingual transcription model offering a good balance of speed and cost-effectiveness. Supports English, Spanish, German, French, Portuguese, and Italian. Features intelligent endpointing and keyterms prompting support for up to 100 words. Best for:

Cost-effective real-time transcription across languages
Cost-sensitive multilingual streaming across EN/ES/DE/FR/PT/IT

Supported languages

Learn more about Universal-Streaming Multilingual

Universal-Streaming English

An English transcription model offering a good balance of speed and cost-effectiveness. Features ~300ms word-by-word immutable transcripts, intelligent endpointing, and keyterms prompting support for up to 100 words. Best for:

Cost-effective real-time transcription for English
English-only real-time apps — fastest and cheapest streaming option for English

Supported languages

Learn more about Universal-Streaming English

To learn how to specify a model, see selecting a model for pre-recorded audio or selecting a model for streaming audio.

Pricing

For detailed pricing information, visit our pricing page.

Pre-recorded

Model	Price per Hour	Volume discounts
Universal-3.5 Pro	$0.21/hr	Available
Universal-2	$0.15/hr	Available

Streaming

Streaming is billed per hour of session duration — the total time your WebSocket connection stays open — not per hour of audio sent. See Streaming Speech-to-Text billing for details.

Model	Price per Hour (session duration)	Volume discounts
Universal-3.5 Pro Streaming	$0.45/hr	Available
Universal-Streaming Multilingual	$0.15/hr	Available
Universal-Streaming English	$0.15/hr	Available

For volume discounts, please reach out to sales@assemblyai.com.

Next steps

Explore Speech Understanding features like summarization, sentiment analysis, and more
Learn about prompting: Universal-3.5 Pro prompting guide | Universal-3.5 Pro Streaming prompting guide

Getting started

Use cases & integrations

Trust & security

Pre-recorded models

Universal-3.5 Pro

Universal-2

Streaming models

Universal-3.5 Pro Streaming

Universal-Streaming Multilingual

Universal-Streaming English

Add-on models

Medical Mode

Medical Mode

Choosing the right model

Pre-recorded

Universal-3.5 Pro

Universal-2

Streaming

Universal-3.5 Pro Streaming

Universal-Streaming Multilingual

Universal-Streaming English

Pricing

Pre-recorded

Streaming

Next steps

​Pre-recorded models

Universal-3.5 Pro

Universal-2

​Streaming models

Universal-3.5 Pro Streaming

Universal-Streaming Multilingual

Universal-Streaming English

​Add-on models

Medical Mode

​Medical Mode

​Choosing the right model

​Pre-recorded

​Universal-3.5 Pro

​Universal-2

​Streaming

​Universal-3.5 Pro Streaming

​Universal-Streaming Multilingual

​Universal-Streaming English

​Pricing

​Pre-recorded

​Streaming

​Next steps

Pre-recorded models

Streaming models

Add-on models

Medical Mode

Choosing the right model

Pre-recorded

Universal-3.5 Pro

Universal-2

Streaming

Universal-3.5 Pro Streaming

Universal-Streaming Multilingual

Universal-Streaming English

Pricing

Pre-recorded

Streaming

Next steps