Speech-to-Text
Experience industry-leading speech-to-text accuracy with Speech AI models on the cutting-edge of AI research, accessible through a simple API.
Universal-2
State-of-the-art speech-to-text model
>93%
30.4s
12.5M
Industry’s lowest Word Error Rate (WER)
See how Universal-2 performs against other Automatic Speech Recognition providers.
Read our research0%
4%
8%
12%
AssemblyAI
OpenAI
Azure
Deepgram
AWS
See it in action
Explore Universal-2*Benchmark performed across 10 datasets, including 6 public datasets & 4 internally curated datasets representing real world English audio.
Harness best-in-class accuracy and powerful Speech AI capabilities
International Language Support
Gain support to transcribe over 99+ languages and counting, including Global English (English and all of its accents).
See how in docs
Speaker Diarization
Detect the number of speakers in your audio file, with each word in the text associated with its speaker.
See how in docs
Automatic Language Detection
Automatically detect if the dominant language of the spoken audio is supported by our API and route it to the appropriate model for transcription.
See how in docs
Async Speech-to-Text
The AssemblyAI API can transcribe pre-recorded audio and/or video files in seconds, with human-level accuracy. Highly scalable to tens of thousands of files in parallel.
See how in docs
Word Timings
View word-by-word timestamps across the entire transcript text.
See how in docs
Profanity Filtering
Detect and replace profanity in the transcription text with ease.
See how in docs
Auto Punctuation and Casing
Automatically add casing and punctuation of proper nouns to the transcription text.
See how in docs
Custom Vocabulary
Boost accuracy for vocabulary that is unique or custom to your specific use case or product.
See how in docs
Confidence Scores
Get a confidence score for each word in the transcript.
See how in docs
Filler Words
Optionally include disfluencies in the transcripts of your audio files.
See how in docs
Custom Spelling
Specify how you would like certain words to be spelled or formatted in the transcription text.
See how in docs
See everything in docsExplore more
Streaming Speech-to-Text
Transcribe audio streams synchronously with high accuracy and low latency.
Speech Understanding
Extract maximum value from voice data with Audio Intelligence, and leverage Large Language Models with LeMUR.
Get started in seconds
1
2
3
4
5
6
import assemblyai as aai
transcriber = aai.Transcriber()
transcript = transcriber.transcribe(URL, config)
print(transcript)
{
"id": "6rlr37h8f4-e310-4e23-bbf3-ea5f347dc684",
"language_code": "en_us",
"status": "completed",
"text": "Runner's knee is a condition characterized by pain behind or around the kneecap...",
"confidence": 0.98122,
"audio_duration": 3200,
"words": [
{ "text": "Runner's", "start": 0, "end": 550, "speaker": "A", "confidence": 0.98113 },
{ "text": "knee", "start": 580, "end": 1130, "speaker": "A", "confidence": 0.95417 }
]
}