Pre-recorded Speech-to-Text API

Get clean, customizable transcripts in 99 languages with industry-leading accuracy and natural language prompting.

Universal-3.5 Pro

Your transcriptions will show here...

import assemblyai as aai

aai.settings.api_key = "YOUR_API_KEY"

audio_file = "https://assembly.ai/wildfires.mp3"

config = aai.TranscriptionConfig(
    speech_models=["universal-3-pro", "universal-2"],
    language_detection=True,
    speaker_labels=True,
)

transcript = aai.Transcriber().transcribe(audio_file, config=config)

if transcript.status == aai.TranscriptStatus.error:
    raise RuntimeError(f"Transcription failed: {transcript.error}")

print(transcript.text)

for utterance in transcript.utterances:
    print(f"Speaker {utterance.speaker}: {utterance.text}")

Models

Pick the model that fits your workload

Accuracy that holds up on real-world audio, tunable with a single parameter.

Universal-3.5 Pro

The most accurate, controllable model on the market.

Learn more Try for free

Complex, domain-specific audio
Natural language prompting
Precise entity handling
18 languages with code-switching

Universal-2

High-accuracy transcription at scale across 99 languages.

Learn more Try for free

Proven accuracy at scale
Keyterms prompting
Strong entity handling
99 languages with code-switching

Add-on

Medical Mode

Clinical-grade transcription accuracy

Clinical-grade accuracy
Medical terminology recognition
Noise-resilient transcription
BAA-eligible infrastructure

Learn more

Compare features

Model

Universal-3.5 Pro Domain-specific, multilingual, complex audio

Universal-2 High-volume, cost efficient, global languages

Medical Mode Clinical settings, medical term recognition

Price

$0.21 /hr

$0.15 /hr

+$0.07 /hr on any model

Languages

EN, ES, FR, DE, IT, PT, AR, DA, NL, HE, HI, JA, ZH, VI, FI, NO, SV, TR

99 Languages

Inherits base model languages

Natural language prompting

Up to ~1,500 words

—

Keyterm prompting

Medical vocabulary

Code-switching

—

Speaker diarization

10+ speakers

Clinician / patient labels

Medical terminology

—

Drugs, dosages, ICD codes

HIPAA BAA

On request

Unlimited concurrency

Use cases

Built for every voice workflow

Async transcription powers every application where you work with recorded audio.

Conversation intelligence

Transcribe sales calls, support tickets, and customer interviews. Feed clean transcripts into sentiment analysis and topic detection.

AI Scribes

Capture patient-provider conversations with clinical-grade accuracy. Generate SOAP notes, intake summaries, and EHR-ready documentation.

Podcast and media

Transcribe long-form audio for search indexing, automated chapters, and subtitle generation. 99 languages, no configuration needed.

Call analytics

Process thousands of call recordings per day with speaker labels, sentiment scores, and key phrase extraction. Automate QA at scale.

AI notetakers

Turn recorded meetings into structured summaries with speaker attribution, action items, and searchable timestamps.

Playground

We’re not playing around, but you can

Put our Voice AI models to the test in our no-code playground.

Try it out

AI Speech-to-Text transcription in 99 languages

From Spanish to Korean, deliver accurate Voice AI in the languages your users speak.

🇪🇸 Spanish 🇵🇹 Portuguese 🇫🇷 French 🇩🇪 German 🇮🇳 Hindi 🇷🇺 Russian 🇳🇱 Dutch 🇯🇵 Japanese 🇮🇹 Italian 🇵🇱 Polish 🇺🇦 Ukrainian 🇮🇩 Indonesian 🇹🇷 Turkish 🇨🇳 Chinese 🇰🇷 Korean

Frequently asked questions

: A speech-to-text API is a developer interface that turns audio into text. Your app sends an audio file or live stream to an endpoint and receives a transcript, often with word timestamps, speaker labels, and confidence scores.
: AssemblyAI's Universal-3.5 Pro model leads our published benchmarks with industry-best accuracy on real-world audio — including noisy environments, accents, and technical vocabulary.
: Yes. For pre-recorded audio, AssemblyAI detects and transcribes code-switching, with best results for English + Spanish or English + German.
: Yes. AssemblyAI offers the Realtime Speech-to-Text API via a secure WebSocket API, returning partial and final transcripts within a few hundred milliseconds.
: Sign up and get your API key in the Dashboard. Install an SDK (e.g., JavaScript or Python). Initialize the client with your key, then call transcribe with your audio.
: Pay-as-you-go: $0.15/hr for Universal-2, $0.21/hr for Universal-3.5 Pro, +$0.07/hr for Medical Mode on any base model. Billed per second. Optional Speech Understanding features are priced separately.
: Yes. AssemblyAI integrates with LLMs via LLM Gateway — a single API to OpenAI (GPT), Anthropic (Claude), Google (Gemini), and more.