Models

AssemblyAI offers several state-of-the-art speech recognition models, each optimized for different use cases. Choose the model that best fits your needs based on accuracy, latency, cost, and language requirements.

Choosing the right model

Universal-3-Pro

Universal 3 Pro is our most advanced transcription model, delivering state-of-the-art accuracy across 6 languages with powerful prompting capabilities. It supports prompting in plain language for tasks like context-specific transcription, verbatim output, audio tagging, and speaker diarization, giving you fine-grained control to guide transcription results. With keyterms prompting supporting up to 1,000 words, built-in code switching, and multichannel support, Universal 3 Pro is ideal for complex audio scenarios requiring the highest accuracy.

Supported languages: en, es, de, fr, pt, it

Try universal-3-pro here

Universal-2

Universal 2 offers the broadest language coverage of any of our models, supporting high-accuracy transcription across 99 languages with low latency. It supports customization through keyterms prompting (up to 200 words) and includes features like multichannel support, automatic language detection, code switching, and speaker diarization and more. Universal 2 is the go-to choice when you need reliable transcription across diverse languages.

Supported languages: en, en_au, en_uk, en_us, es, fr, de, it, pt, nl, hi, ja, zh, fi, ko, pl, ru, tr, uk, vi, af, sq, am, ar, hy, as, az, ba, eu, be, bn, bs, br, bg, my, ca, hr, cs, da, et, fo, gl, ka, el, gu, ht, ha, haw, he, hu, is, id, jw, kn, kk, km, lo, la, lv, ln, lt, lb, mk, mg, ms, ml, mt, mi, mr, mn, ne, no, nn, oc, pa, ps, fa, ro, sa, sr, sn, sd, si, sk, sl, so, su, sw, sv, de_ch, tl, tg, ta, tt, te, th, bo, tk, ur, uz, cy, yi, yo

Try universal-2 here

Universal-Streaming

  • Best for: Voice agents and real-time voice applications
  • Key benefits:
    • ~300ms immutable transcripts
    • Continuous speech recognition
    • Intelligent endpointing
    • Ideal for voice agents and interactive applications

Supported languages: en, es, fr, de, it, pt

Try universal-streaming here

Universal-Streaming-Multilingual

Universal-Streaming-Multilingual extends real-time streaming with per-utterance language detection across 6 languages. It automatically identifies the language of each utterance, returning a language_code and language_confidence score, making it ideal for multilingual conversations and environments where speakers switch between languages.

Supported languages: en, es, fr, de, it, pt

Try universal-streaming-multilingual here

Pricing

For detailed pricing information, visit our pricing page.

ModelPrice per HourVolume discounts
Universal-2$0.15/hrAvailable
Universal-3-Pro$0.21/hrAvailable
Universal-Streaming$0.15/hrAvailable
Universal-Streaming-Multilingual$0.15/hrAvailable

The rates shown above are offered subject to participation in our model improvement program to help us continue to provide best-in-class speech-to-text. Rates may be different for accounts that opt out of this program.

For volume discounts, please reach out to sales@assemblyai.com.

Next steps