Next-gen Speech AI for next-level product experiences
Our most advanced speech-to-text model captures the complexity of human speech for impeccable audio data that powers sharper insights, faster workflows, and best-in-class product experiences.
Leading the industry—again and again
Universal-2 builds on the strengths of Universal-1 with even greater accuracy and precision for audio data that doesn’t need double checking.
80%
85%
90%
95%
Universal-2
Universal-1
OpenAI
Microsoft
Deepgram
Amazon
Metric | AssemblyAI Universal-2 | AssemblyAI Universal-1 | OpenAI Whisper Large-v3 | Microsoft Azure Batch v3.1 | Deepgram Nova 2 | Amazon Amazon Transcribe | Google Latest-long |
---|---|---|---|---|---|---|---|
Word Accuracy Rate | 93.3% | 93.1% | 91.7% | 91.2% | 90.8% | 89.7% | 85.2% |
Built on top of the best—then made even better
Accuracy is more than just the right words—it’s trust in your data. Universal-2 lets users spend less time filling in the gaps and more time putting insight into action.
Proper nouns
A 24% improvement in the recognition of rare words like names, brands, locations, and more for more personalized customer-facing communications, intuitive automated systems, and cleaner integration processes.
Text formatting
A 15% improvement in transcript structure with proper punctuation and casing across things like emails, dates, and dollar amounts for faster information navigation and more natural transcripts in customer products.
Alphanumerics
A 21% increase in accuracy across critical data like phone numbers, zip codes, and other numerical identifiers for smoother customer experiences, better critical data management, and clearer escalation and reporting.
Universal-2 captures real-world complexity
With reduced word-error rates in 3 key areas.
*Truncated at 25% for visualization
0%
5%
10%
15%
20%
25%
Proper nouns
Text formatting
Alphanumerics
Metric | AssemblyAI Universal-2 | AssemblyAI Universal-1 | OpenAI Whisper Large-v3 | Microsoft Azure Batch v3.1 | Deepgram Nova 2 | Amazon Amazon Transcribe | Google Latest-long |
---|---|---|---|---|---|---|---|
Proper nouns (Jaro-Winkler Error Rate) | 13.87% | 18.17% | 15.41% | 26.84% | 21.14% | 37.57% | 47.64% |
Text formatting (Word Error Rate) | 10.06% | 11.77% | 12.01% | 12.14% | 12.39% | 14.47% | 25.45% |
Alphanumerics (Word Error Rate) | 4.00% | 5.06% | 3.84% | 5.19% | 4.97% | 6.24% | 8.43% |
It’s more than accurate—it’s the industry preference
Universal-2 is the most preferred model to date. Before that? Universal-1 took the cake. We’ve made a habit out of making models people love.
Universal-2
72.9%
Universal-1
25.9%
Neutral
1.2%
*Qualitative benchmarks from an unbiased, third-party evaluation.
The cleanest outputs in the industry
Universal-2 is closing the gap between transcription and true understanding, with best-in-breed audio data you can reliably stand on—and behind.
More on Universal-2
Research
Universal-2 is the latest milestone in AssemblyAI's mission to push the boundaries of Speech AI technology and unlock the full potential of voice data for all.
Explore the research
Playground
Access our production-ready Speech AI models for speech recognition, speaker detection, audio summarization, and more—all in our no-code playground.
Try our Playground
Pricing
Universal-2 is available as an API for developers to build applications and services. We offer pricing that scales with tiered payment options and custom volume discounts.
Get our pricing
Universal-2 is all-in-one
Our comprehensive system lets you build expertly, effortlessly on our developer-preferred API with leading Speech AI capabilities, built-in model updates, and tech that keeps you on the cutting edge.
1
2
3
4
5
6
import assemblyai as aai
transcriber = aai.Transcriber()
transcript = transcriber.transcribe(URL, config)
print(transcript)
{
"id": "6rlr37h8f4-e310-4e23-bbf3-ea5f347dc684",
"language_code": "en_us",
"status": "completed",
"text": "Runner's knee is a condition characterized by pain behind or around the kneecap...",
"confidence": 0.98122,
"audio_duration": 3200,
"words": [
{ "text": "Runner's", "start": 0, "end": 550, "speaker": "A", "confidence": 0.98113 },
{ "text": "knee", "start": 580, "end": 1130, "speaker": "A", "confidence": 0.95417 }
]
}