Pricing built for innovation

Start free, pay-as-you-go after that – no commitments required.

Pre-recorded Speech-to-Text API

Build Voice AI on the most accurate Speech-to-Text with language detection, formatting, filler words, keyterms prompting, custom spelling, word-level timestamps, and more.

Models	Pay as you go	Custom
Universal-3.5 Pro The most accurate async speech-to-text model that transcribes every conversation exactly as it's heard – works across 18 languages, with native code switching and our most accurate speaker diarization yet.	$0.21 /hr	Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads. Contact us
Universal-2 Our highly accurate speech-to-text model trained on over 12.5 million hours of audio data. Supports 99 languages. Exceptional accuracy at a lower price.	$0.15 /hr

Models

Pay as you go

Universal-3.5 Pro

The most accurate async speech-to-text model that transcribes every conversation exactly as it's heard – works across 18 languages, with native code switching and our most accurate speaker diarization yet.

$0.21 /hr

Universal-2

Our highly accurate speech-to-text model trained on over 12.5 million hours of audio data. Supports 99 languages. Exceptional accuracy at a lower price.

$0.15 /hr

Custom

Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads.

Add-on features	Universal-3.5 Pro	Universal-2
Keyterms Prompting Provide up to 1000 words or phrases (maximum 6 words per phrase) to improve transcription accuracy.	$0.05 /hr	Included
Prompting Describe your audio in plain language to improve transcription accuracy — cover the domain, scenario, or full conversation details.	$0.05 /hr	Not supported
Speaker Diarization Detect multiple speakers in audio files and segment the transcript into utterances, showing what each speaker said.	$0.02 /hr	$0.02 /hr
Medical Mode New Optimize transcription for medical terminology and healthcare conversations with significantly improved accuracy.	$0.15 /hr	$0.15 /hr

Add-on features

Keyterms Prompting

Provide up to 1000 words or phrases (maximum 6 words per phrase) to improve transcription accuracy.

Universal-3.5 Pro $0.05 /hr

Universal-2 Included

Prompting

Describe your audio in plain language to improve transcription accuracy — cover the domain, scenario, or full conversation details.

Universal-3.5 Pro $0.05 /hr

Universal-2 Not supported

Speaker Diarization

Detect multiple speakers in audio files and segment the transcript into utterances, showing what each speaker said.

Universal-3.5 Pro $0.02 /hr

Universal-2 $0.02 /hr

Medical Mode New

Optimize transcription for medical terminology and healthcare conversations with significantly improved accuracy.

Universal-3.5 Pro $0.15 /hr

Universal-2 $0.15 /hr

Realtime Speech-to-Text API

Transcribe live audio and video files in real time at ultra-low latency and high-quality accuracy. Leverage auto punctuation and casing, next-gen end-of-turn detection, and ITM/formatting.

Models	Pay as you go	Custom
Universal-3.5 Pro Realtime New The most accurate realtime model for high-quality voice agents with built-in context carryover and conversation memory. Supports 18 languages with self-correcting speaker labels, voice isolation, and keyterm prompting.	$0.45 /hr	Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads. Contact us
Universal-Streaming High-quality realtime model for English only transcription. Purpose-built for production voice applications with reliable accuracy, low latency, and cost-effectiveness.	$0.15 /hr
Universal-Streaming Multilingual Multilingual transcription at the speed and cost of Universal-Streaming. Supports English, Spanish, German, French, Portuguese, and Italian.	$0.15 /hr

Models

Pay as you go

Universal-3.5 Pro Realtime New

The most accurate realtime model for high-quality voice agents with built-in context carryover and conversation memory. Supports 18 languages with self-correcting speaker labels, voice isolation, and keyterm prompting.

$0.45 /hr

Universal-Streaming

High-quality realtime model for English only transcription. Purpose-built for production voice applications with reliable accuracy, low latency, and cost-effectiveness.

$0.15 /hr

Universal-Streaming Multilingual

Multilingual transcription at the speed and cost of Universal-Streaming. Supports English, Spanish, German, French, Portuguese, and Italian.

$0.15 /hr

Custom

Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads.

Add-on features	Universal-3.5 Pro Realtime	Universal-Streaming
Keyterms Prompting Provide up to 100 words or phrases (maximum 6 words per phrase) to improve transcription accuracy.	Included	$0.04 /hr
Speaker Diarization Detect multiple speakers in audio files and segment the transcript into utterances, showing what each speaker said.	$0.12 /hr	$0.12 /hr
Prompting Describe your audio in plain language to improve transcription accuracy — cover the domain, scenario, or full conversation details.	$0.05 /hr	Not supported
Medical Mode Optimize transcription for medical terminology and healthcare conversations with significantly improved accuracy.	$0.15 /hr	$0.15 /hr
Voice Focus New Hear the speaker, not the room. Isolate the primary speaker and suppress everything else.	$0.10 /hr	Not supported

Add-on features

Keyterms Prompting

Provide up to 100 words or phrases (maximum 6 words per phrase) to improve transcription accuracy.

Universal-3.5 Pro Realtime Included

Universal-Streaming $0.04 /hr

Speaker Diarization

Detect multiple speakers in audio files and segment the transcript into utterances, showing what each speaker said.

Universal-3.5 Pro Realtime $0.12 /hr

Universal-Streaming $0.12 /hr

Prompting

Describe your audio in plain language to improve transcription accuracy — cover the domain, scenario, or full conversation details.

Universal-3.5 Pro Realtime $0.05 /hr

Universal-Streaming Not supported

Medical Mode

Optimize transcription for medical terminology and healthcare conversations with significantly improved accuracy.

Universal-3.5 Pro Realtime $0.15 /hr

Universal-Streaming $0.15 /hr

Voice Focus New

Hear the speaker, not the room. Isolate the primary speaker and suppress everything else.

Universal-3.5 Pro Realtime $0.10 /hr

Universal-Streaming Not supported

Sync Speech-to-Text API

Finished transcripts in a single API call. POST a short clip and read the transcript off the response — no polling, no WebSocket, no job to manage.

Models	Pay as you go	Custom
Sync API New Universal-3.5 Pro accuracy delivered in a single call. Process up to 2 minutes per request, get results back in ~134 ms (p50). Includes per-word timing and confidence across 18 languages.	$0.45 /hr	Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads. Contact us

Models

Pay as you go

Sync API New

Universal-3.5 Pro accuracy delivered in a single call. Process up to 2 minutes per request, get results back in ~134 ms (p50). Includes per-word timing and confidence across 18 languages.

$0.45 /hr

Custom

Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads.

Add-on features	Sync API
Keyterms Prompting Provide words or phrases to improve transcription accuracy.	Included
Conversation Context Supply prior dialogue to give the model conversational continuity and boost accuracy.	Included
Prompting Describe your audio in plain language to improve transcription accuracy — cover the domain, scenario, or full conversation details.	$0.05 /hr

Add-on features

Sync API

Keyterms Prompting

Provide words or phrases to improve transcription accuracy.

Included

Conversation Context

Supply prior dialogue to give the model conversational continuity and boost accuracy.

Included

Prompting

Describe your audio in plain language to improve transcription accuracy — cover the domain, scenario, or full conversation details.

$0.05 /hr

Voice Agent API

A proprietary Voice AI stack, built end-to-end for production voice agents. Every layer tuned for how people actually talk—on top of the most accurate STT models in the industry.

Models	Pay as you go	Custom
Voice Agent API The fastest path to a working voice agent, built on our industry-leading Realtime Speech-to-Text API.	$4.50/hr ($0.075/min)	Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads. Contact us

Models

Pay as you go

Voice Agent API

The fastest path to a working voice agent, built on our industry-leading Realtime Speech-to-Text API.

$4.50/hr ($0.075/min)

Custom

Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads.

Speech Understanding

AI models that extract meaning from your transcripts. Identify speakers by name, detect sentiment, surface topics, generate summaries, and more.

Models	Pay as you go	Custom
Speaker Identification Identify speakers by their actual names or roles	$0.02 /hr	Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads. Contact us
Translation Convert your content from one language to another	$0.06 /hr
Custom Formatting Standardize and format specific types of information	$0.03 /hr
Entity Detection Identify entities that are spoken, such as names or email addresses	$0.08 /hr
Sentiment Analysis Detect the sentiment of each sentence spoken	$0.02 /hr
Auto Chapters Generate a summary over time for audio and video files	$0.08 /hr
Key Phrases Identify significant words and phrases	$0.01 /hr
Topic Detection Label the topics spoken in standardized IAB taxonomy	$0.15 /hr
Summarization Generate a summary of audio files at scale	$0.03 /hr

Models

Pay as you go

Speaker Identification

Identify speakers by their actual names or roles

$0.02 /hr

Translation

Convert your content from one language to another

$0.06 /hr

Custom Formatting

Standardize and format specific types of information

$0.03 /hr

Entity Detection

Identify entities that are spoken, such as names or email addresses

$0.08 /hr

Sentiment Analysis

Detect the sentiment of each sentence spoken

$0.02 /hr

Auto Chapters

Generate a summary over time for audio and video files

$0.08 /hr

Key Phrases

Identify significant words and phrases

$0.01 /hr

Topic Detection

Label the topics spoken in standardized IAB taxonomy

$0.15 /hr

Summarization

Generate a summary of audio files at scale

$0.03 /hr

Custom

Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads.

Guardrails

Guardrails ensures only high-quality, safe, and compliant content flows through your applications.

Models	Pay as you go	Custom
Profanity Filtering Filter out profanity from your transcripts	$0.01 /hr	Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads. Contact us
PII Audio Redaction Identify and remove PII from the audio file before it is returned to you	$0.05 /hr
PII Text Redaction Identify and remove PII from the transcription text before it is returned to you	$0.08 /hr
Content Moderation Detect sensitive content in your audio and video files	$0.15 /hr

Models

Pay as you go

Profanity Filtering

Filter out profanity from your transcripts

$0.01 /hr

PII Audio Redaction

Identify and remove PII from the audio file before it is returned to you

$0.05 /hr

PII Text Redaction

Identify and remove PII from the transcription text before it is returned to you

$0.08 /hr

Content Moderation

Detect sensitive content in your audio and video files

$0.15 /hr

Custom

Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads.

LLM Gateway

Apply powerful language models directly to your audio data through a single API. Ask questions, generate insights, and build custom workflows all without managing LLM infrastructure.

Models	Input	Output	Custom
GPT-5.5*	$5.00 / 1M	$30.00 / 1M	Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads. Contact us
GPT-5.2*	$1.75 / 1M	$14.00 / 1M
GPT-5.1*	$1.25 / 1M	$10.00 / 1M
Claude 4.8 Opus*	$5.00 / 1M	$25.00 / 1M
Claude 4.6 Sonnet*	$3.00 / 1M	$15.00 / 1M

Models

GPT-5.5*

Input $5.00 / 1M

Output $30.00 / 1M

GPT-5.2*

Input $1.75 / 1M

Output $14.00 / 1M

GPT-5.1*

Input $1.25 / 1M

Output $10.00 / 1M

Claude 4.8 Opus*

Input $5.00 / 1M

Output $25.00 / 1M

Claude 4.6 Sonnet*

Input $3.00 / 1M

Output $15.00 / 1M

Custom

Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads.

* Prices shown are for global routing. In-region (US/EU) pricing is 10% higher due to provider cost increases. Add "model_region": "global" to your API requests to get the rates shown.

Models	Input	Output	Custom
GPT-5.5*	$5.00 / 1M	$30.00 / 1M	Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads. Contact us
GPT-5.2*	$1.75 / 1M	$14.00 / 1M
GPT-5.1*	$1.25 / 1M	$10.00 / 1M
GPT-5*	$1.25 / 1M	$10.00 / 1M
GPT-5-Mini*	$0.25 / 1M	$2.00 / 1M
GPT-5 Nano*	$0.05 / 1M	$0.40 / 1M
GPT 4.1*	$2.00 / 1M	$8.00 / 1M
gpt-oss-20b*	$0.07 / 1M	$0.30 / 1M
gpt-oss-120b*	$0.15 / 1M	$0.60 / 1M

Models

GPT-5.5*

Input $5.00 / 1M

Output $30.00 / 1M

GPT-5.2*

Input $1.75 / 1M

Output $14.00 / 1M

GPT-5.1*

Input $1.25 / 1M

Output $10.00 / 1M

GPT-5*

Input $1.25 / 1M

Output $10.00 / 1M

GPT-5-Mini*

Input $0.25 / 1M

Output $2.00 / 1M

GPT-5 Nano*

Input $0.05 / 1M

Output $0.40 / 1M

GPT 4.1*

Input $2.00 / 1M

Output $8.00 / 1M

gpt-oss-20b*

Input $0.07 / 1M

Output $0.30 / 1M

gpt-oss-120b*

Input $0.15 / 1M

Output $0.60 / 1M

Custom

Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads.

* Prices shown are for global routing. In-region (US/EU) pricing is 10% higher due to provider cost increases. Add "model_region": "global" to your API requests to get the rates shown.

Models	Input	Output	Custom
Claude 4.6 Sonnet*	$3.00 / 1M	$15.00 / 1M	Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads. Contact us
Claude 4.5 Sonnet*	$3.00 / 1M	$15.00 / 1M
Claude 4.5 Haiku*	$1.00 / 1M	$5.00 / 1M
Claude 4.8 Opus*	$5.00 / 1M	$25.00 / 1M
Claude 4.7 Opus*	$5.00 / 1M	$25.00 / 1M
Claude 4.6 Opus*	$5.00 / 1M	$25.00 / 1M
Claude 4.5 Opus*	$5.00 / 1M	$25.00 / 1M

Models

Claude 4.6 Sonnet*

Input $3.00 / 1M

Output $15.00 / 1M

Claude 4.5 Sonnet*

Input $3.00 / 1M

Output $15.00 / 1M

Claude 4.5 Haiku*

Input $1.00 / 1M

Output $5.00 / 1M

Claude 4.8 Opus*

Input $5.00 / 1M

Output $25.00 / 1M

Claude 4.7 Opus*

Input $5.00 / 1M

Output $25.00 / 1M

Claude 4.6 Opus*

Input $5.00 / 1M

Output $25.00 / 1M

Claude 4.5 Opus*

Input $5.00 / 1M

Output $25.00 / 1M

Custom

Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads.

* Prices shown are for global routing. In-region (US/EU) pricing is 10% higher due to provider cost increases. Add "model_region": "global" to your API requests to get the rates shown.

Models	Input	Output	Custom
Gemini 3.5 Flash*	$1.50 / 1M	$9.00 / 1M	Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads. Contact us
Gemini 3 Flash*	$0.50 / 1M	$3.00 / 1M
Gemini 3.1 Flash Lite*	$0.25 / 1M	$1.50 / 1M
Gemini 2.5 Flash*	$0.30 / 1M	$2.50 / 1M
Gemini 2.5 Flash Lite*	$0.10 / 1M	$0.40 / 1M
Gemini 2.5 Pro*	$1.25 / 1M	$10.00 / 1M

Models

Gemini 3.5 Flash*

Input $1.50 / 1M

Output $9.00 / 1M

Gemini 3 Flash*

Input $0.50 / 1M

Output $3.00 / 1M

Gemini 3.1 Flash Lite*

Input $0.25 / 1M

Output $1.50 / 1M

Gemini 2.5 Flash*

Input $0.30 / 1M

Output $2.50 / 1M

Gemini 2.5 Flash Lite*

Input $0.10 / 1M

Output $0.40 / 1M

Gemini 2.5 Pro*

Input $1.25 / 1M

Output $10.00 / 1M

Custom

Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads.

* Prices shown are for global routing. In-region (US/EU) pricing is 10% higher due to provider cost increases. Add "model_region": "global" to your API requests to get the rates shown.

Models	Input	Output	Custom
Qwen3 Next 80B A3B	$0.15 / 1M	$1.20 / 1M	Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads. Contact us
Qwen3 32B	$0.15 / 1M	$0.60 / 1M
Kimi K2.5	$0.60 / 1M	$3.00 / 1M

Models

Qwen3 Next 80B A3B

Input $0.15 / 1M

Output $1.20 / 1M

Qwen3 32B

Input $0.15 / 1M

Output $0.60 / 1M

Kimi K2.5

Input $0.60 / 1M

Output $3.00 / 1M

Custom

Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloads.

Looking for volume-based pricing?

We’ll build a solution tailored to your needs.

Talk to our team

Security

Security and privacy

AssemblyAI uses enterprise-grade security practices to keep your data safe. We approach security by design and default, and continuously ensure AssemblyAI is secure for you and your team.

Learn more

Playground

We’re not playing around, but you can

Put our Voice AI models to the test in our no-code playground.

Try it out

Frequently asked questions

: We have speech to text models available for both pre-recorded audio and live transcription settings. For pre-recorded, Universal-3.5 Pro is our most accurate model, delivering best-in-class transcription across a wide range of audio types and languages. Universal-2 offers excellent accuracy at a lower price point. Our streaming models are optimized for real-time use cases, with Universal-3.5 Pro Realtime delivering the highest accuracy and Universal-Streaming providing a cost-effective option.
: Yes. You can create an account and start transcribing immediately with no credit card required. The free tier includes up to 185 hours of pre-recorded transcription and up to 333 hours of streaming transcription.
: Yes. We offer custom pricing for customers with high-volume usage. Contact our sales team to discuss tiered pricing, volume discounts, and enterprise agreements tailored to your needs.
: AssemblyAI’s Streaming API features free, unlimited, automatic scaling concurrency with no additional fees. On the free plan, you can open up to 5 new streaming connections per minute. On the pay-as-you-go plan, your starting limit is 100 sessions per minute — anytime you are utilizing 70% or more of your current limit, the number of new streams able to be opened over the next minute will automatically increase by 10%. This usage pattern has no ceiling and can continue to scale up indefinitely to whatever your application requires. Custom starting limits beyond 100 are also available at no additional cost.
: We bill monthly based on your actual usage. There are no minimum commitments, upfront fees, or contracts on the pay-as-you-go plan. Invoices are generated at the start of each month for the previous month’s usage.
: Multichannel audio is billed per channel. For example, a 1-hour stereo (2-channel) file is billed as 2 hours of transcription. Each channel is transcribed independently, which provides more accurate results for multi-speaker recordings.
: Yes. AssemblyAI is available on the AWS Marketplace, allowing you to consolidate billing through your existing AWS account. Contact our sales team for details on setting this up.
: AssemblyAI supports transcription in over 99 languages across our models. Please visit our documentation for the complete list.
: Yes. With Speaker Diarization, AssemblyAI can detect and label different speakers — available for both pre-recorded and Realtime Speech-to-Text API models. AssemblyAI builds and trains its speaker diarization models in-house. And with Speaker Identification, you can replace generic “Speaker A” and “Speaker B” labels with real names or roles.
: A token is a unit of text used by large language models (LLMs) in the LLM Gateway. Tokens roughly correspond to word fragments — on average, one English word equals about 1.3 tokens. LLM Gateway pricing is based on the number of input and output tokens processed by the selected model.