Streaming Speech-to-Text
Convert live audio streams into text synchronously with nearly 90% accuracy and <600ms latency.
Hey, adventurers! Welcome to today's exciting livestream event, where we're embarking on an expedition to uncover the secrets of the Lost Temple hidden deep within this mysterious jungle! I'm your host, Emily, and I'm thrilled to have you all joining me on this epic adventure. Just look at this incredible jungle landscape, teeming with life and brimming with secrets waiting to be discovered! Who knows what ancient mysteries lie within these dense foliage? [Camera zooms in on a vine-covered ruin peeking through the trees] Emily: And there it is, folks! Our destination, the Lost Temple, a relic of a long-forgotten civilization lost to time. Legend has it that this temple holds untold riches and powerful artifacts beyond imagination!
Automatically turn live audio into text
Transcribe conversations, meetings, and live events synchronously and elevate live interactions instantly.
Try in the PlaygroundIndustry-leading quality at low latency
Low latency
Industry-leading quality
High concurrency
Advanced punctuation & casing
Feature-rich Streaming Speech-to-Text
Streaming Transcription
Transcribe live audio with high accuracy and low latency.
See how in docs
Auto Punctuation and Casing
Automatically add casing and punctuation of proper nouns to the transcription text.
See how in docs
Custom Vocabulary
Boost accuracy for vocabulary that is unique or custom to your specific use case or product.
See how in docs
ITN/Formatting
Automatically convert spoken form text into its proper written format to increase transcript readability.
See how in docs
End of Utterance Detection
Customize End of Utterance Detection to more accurately detect when one speaker finishes an utterance in Streaming Speech-to-Text.
See how in docs
See everything in docsExplore more
Speech-to-Text
Build on top of the most accurate Speech-to-Text model on the market with >92.5% accuracy.
Speech Understanding
Extract maximum value from voice data with Audio Intelligence, and leverage Large Language Models with LeMUR.
Get started in seconds
1
2
3
4
5
6
import assemblyai as aai
transcriber = aai.Transcriber()
transcript = transcriber.transcribe(URL, config)
print(transcript)
{
"id": "6rlr37h8f4-e310-4e23-bbf3-ea5f347dc684",
"language_code": "en_us",
"status": "completed",
"text": "Runner's knee is a condition characterized by pain behind or around the kneecap...",
"confidence": 0.98122,
"audio_duration": 3200,
"words": [
{ "text": "Runner's", "start": 0, "end": 550, "speaker": "A", "confidence": 0.98113 },
{ "text": "knee", "start": 580, "end": 1130, "speaker": "A", "confidence": 0.95417 }
]
}