Roadmap | AssemblyAI

Pre-recorded Speech-to-Text API #

Speech-to-text for pre-recorded audio.

Upcoming

Universal-3.6 Pro Async #Q3 2026

The next Universal-3.x Pro release. Native-language coverage will grow from 18 to 31 languages, with accuracy gains on already supported languages.

A new model option priced the same as Universal-3.5 Pro.

New languages: Korean, Russian, Greek, Bulgarian, Polish, Romanian, Hungarian, Czech, Thai, Tagalog, Indonesian, Afrikaans.
Community Models #Q3–Q4 2026

Open-source speech-to-text models served directly through our API, for the languages and domains they specialize in.
Voice Fingerprinting #Q3 2026

Recognize the same speaker across recordings, not just within a single file. For meetings, call centers, and cross-session analytics.
Sync STT API #Q3 2026

Synchronous HTTP transcription on u3-pro, ~134ms p50 latency, for voice-agent calls that need a transcript in a single request.
Self-Hosted Async #Q4 2026

On-premise universal-3-pro for regulated environments with strict data-residency requirements.
Universal-4 Pro Async #Q1 2027

The next major accuracy and capability release after Universal-3.x Pro.

Recently shipped

2026-07-07
Universal-3.5 Pro Async # — Native-language coverage grew from 6 to 18 languages, with improved code-switching (WER dropped from 9.07% to 7.69%, the best accuracy among all models tested) and stronger speaker diarization performance (the lowest average cpWER across six diarization datasets among the seven models tested). New languages: Japanese, Vietnamese, Arabic, Dutch, Swedish, Hindi, Norwegian, Finnish, Danish, Hebrew, Mandarin, Turkish.
2026-06-30
Faster Turnaround Time # — Universal-3 Pro transcription is significantly faster end-to-end. Between early April and the end of June, median turnaround time dropped 11-43% across audio-duration cohorts and P99 turnaround time dropped 26-59%.
2026-04-14
Universal-3 Pro Async Timestamp Improvements # — Better Universal-3 Pro timestamps: median precision up 15.3% for English and 8.6% for non-English, with P99 gains of 15.0% and 58.4%.
2026-03-30
Hebrew & Swedish # — Accuracy gains in Hebrew and Swedish via community models. Word error rates down 37% and 47%.
2026-03-25
Medical Mode # — LLM-powered correction for medical terminology: 4.97% error rate versus 7.32% for the next-best vendor. Add-on to Universal-3 Pro in English, Spanish, German, French, Portuguese, and Italian.
2026-03-16
PII Audio Redaction using Silence # — Redact PII with silence instead of a beep, reducing listener fatigue when redacted audio is replayed at scale.
2026-02-03
Universal-3 Pro Async # — Promptable speech-to-text with natural-language and custom-vocabulary prompts, mid-sentence language switching across six core languages, and audio tagging.
2026-01-28
Improved Short-Audio Diarization # — 19% better speaker-count accuracy and 6% lower speaker-attributed word error rate on audio under two minutes.
2026-01-02
Multichannel Diarization # — Per-channel speaker labels for multi-microphone recordings, eliminating crosstalk ambiguity in call-center and meeting audio.

Realtime Speech-to-Text API #

Low-latency streaming speech-to-text for live audio.

Upcoming

Universal-3.5 Pro Realtime #Q2 2026

Our new flagship realtime model gives every turn the context a conversation actually has. The model takes direction from your agent, keeps a rolling memory of the call on its own, and hears the speaker instead of the room, so spelled-out emails, account IDs, and one-word confirmations come out right. It runs in 19 languages at flagship accuracy with mid-sentence code-switching, plus a new language_code parameter to commit to one language when you already know it.

A new model option priced the same as u3-pro, not an automatic drop-in. The current model stays available as a pinned snapshot.

New languages: Japanese, Vietnamese, Arabic, Dutch, Swedish, Hindi, Norwegian, Finnish, Danish, Urdu, Hebrew, Mandarin, Turkish.
Universal 4 Realtime #Q3 2026

A fast, cost-efficient realtime model for notetaking and meeting intelligence. Tuned for long-form audio where throughput and stable accuracy over multi-hour sessions matter more than latency.
Self-Hosted Realtime #Q3 2026

On-premise universal-realtime-3-pro for regulated environments with strict data-residency requirements.
Universal-4 Pro Realtime #Q4 2026

The next-generation realtime model for voice agents. Targets the lowest turn latency and the strongest handling of voice-agent audio (noise, interruptions, hesitation, accented speech), with instruction-following strong enough to replace today’s STT + LLM + TTS stack. Multilingual across 15+ native languages and the foundation for our speech-to-speech architecture.
Universal-1 Duplex (Preview) #Q1 2027

A single native model that replaces today’s Voice Agent pipeline (STT, LLM, TTS) with a unified Realtime Speech LLM. Tighter latency, better prosody, and more natural interruption handling than orchestrated stacks.

Recently shipped

2026-06-04
Voice Focus # — Realtime noise suppression for voice agents and telephony, so accuracy holds up in real call-center conditions with no separate preprocessor.
2026-06-04
Streaming Modes # — min_latency, balanced, and max_accuracy presets to tune the latency/accuracy trade-off per workload.
2026-06-04
Context Carryover # — Universal-3 Pro Streaming carries prior finalized turns forward as context to improve accuracy, on by default. Optionally pass your voice agent’s spoken reply via agent_context so the model knows the question the user is answering.
2026-06-02
Streaming Speaker Revision # — An end-of-stream SpeakerRevision message returns corrected speaker labels at async-parity cpWER, for roughly 400ms of added latency.
2026-05-12
Streaming PII Redaction # — PII detection and redaction in the realtime pipeline for HIPAA, PCI, and similar workloads. Configurable entity types and substitution modes.
2026-03-24
Medical Mode # — LLM-powered correction for medical terminology: 4.97% error rate versus 7.32% for the next-best vendor. Async and streaming on universal-realtime-3-pro.
2026-03-17
Streaming Diarization v1.5 # — Speaker-aware sentence splitting: 4-5% lower word error rate, 56% fewer phantom speakers, and gains on the CallHome and AMI benchmarks.
2026-03-03
Universal-3 Pro Realtime # — Realtime speech-to-text with inline speaker labeling, custom vocabulary up to 1,000 words, audio tagging, filler-word control, mid-sentence language switching, and 99+ languages via Whisper routing. EU region support.
2026-01-20
Edge Routing and Data Zone Endpoints # — Global low-latency routing with US/EU data-residency endpoints. No additional charge.

Voice Agent API #

End-to-end Voice Agent API.

Upcoming

Twilio Integration #Q2 2026

Direct Twilio SIP and voice connectivity, without customer-side LiveKit plumbing.
Edge Voice Agent Platform #Q3 2026

Full programmatic control of voice agents at the edge. Create, version, and manage agents as code, persist and retrieve sessions (events, transcripts, tool calls), and run webhooks and tool calls at the edge instead of round-tripping to origin.
SDK #Q3 2026

Official client libraries, starting with Python and TypeScript.
Turn Detection Model #Q3 2026

A custom turn-detection model trained on Universal-3 Pro streaming output. Reduces false endpointing and handles pauses, hesitations, and overlapping speech better.

Recently shipped

2026-04-29
Voice Agent API # — Production release of the Voice Agent API (formerly Speech-to-Speech API), built on universal-realtime-3-pro, LLM Gateway, and TTS on self-hosted LiveKit. PCI-certified.
2025-12-17
Voice Agent Preview # — First public release of end-to-end voice AI, combining universal-realtime-3-pro, LLM Gateway, and TTS on LiveKit.

TTS #

Text-to-speech built for voice agents.

Upcoming

Universal TTS 1 #Q3 2026

A standalone text-to-speech model for production voice workloads. Low time-to-first-byte, voice prompting, and accurate delivery of phone numbers, emails, and named entities that today’s TTS struggles with.
Community Models #Q4 2026

Open-source text-to-speech models served through our API alongside Universal TTS, for the languages, styles, and domains they specialize in.

Speech Understanding API #

Extract meaning, sentiment, and events from audio.

Upcoming

Improved Summarization #Q2 2026

Summarization via the LLM Gateway, replacing legacy LeMUR summaries, using frontier models with automatic fallbacks.
Improved Auto-Chaptering #Q2 2026

Sharper chapter boundaries and titles via the LLM Gateway, with better topic segmentation on long-form content.
Speaker ID, Translation, and Custom Formatting v2 #Q2 2026

Accuracy improvements to Speaker ID, Translation, and Custom Formatting. Translation covers both streaming and pre-recorded audio, for workflows where the spoken language differs from the output.
Custom Entity Redaction #Q2–Q3 2026

Static find-and-replace redaction (redact_pii_static_entities) for known terms, with LLM-powered category redaction following in Q3/Q4.
Context-Based Transcription Correction #Q3 2026

Automatic LLM-based transcript correction with no user prompts, generalizing the Medical Mode pattern to any domain.
Key-Term Improvements #Q3 2026

Better keyterm prompting in every supported language via LLM Gateway post-processing, closing the gap with English for Spanish, German, French, Portuguese, and Italian.
Emotion Detection #Q4 2026

Detect speaker emotions and shifts in input audio, distinct from TTS-side Emotion and Style Tagging. For therapy, CX scoring, and compliance monitoring.

LLM Gateway #

One API for every major LLM. Built-in fallbacks and audio-first integration.

Upcoming

Community Models #Q1–Q4 2026

Ongoing catalog expansion. The Gateway supports 24 models across Anthropic, OpenAI, Google, Qwen, and Kimi, with DeepSeek, Mistral, Llama, and Cohere next.
Service Lanes #Q3 2026

Priority, standard, and flex request tiers for per-request cost and latency control.

Recently shipped

2026-06-02
Global Routing # — An opt-in model_region: global setting that routes to lower-cost capacity for Claude calls. Gemini 3 series coming soon.
2026-05-28
Claude Opus 4.8, Gemini 3.5 Flash, Gemini 3.1 Flash Lite (GA) # — Three new models added to the catalog, available through the Gateway on day one.
2026-05-21
Reasoning Mode # — Reasoning and effort controls exposed through the Gateway for OpenAI-compatible, Gemini 3+, and Anthropic models.
2026-04-28
Prompt Caching # — Prompt-cache pass-through, so customers keep the cache discount while routing through the Gateway.
2026-04-17
Claude Opus 4.7 # — Anthropic’s most capable model, available through the Gateway on day one.
2026-03-25
Automatic Model Fallbacks # — The Gateway retries failed requests against a configurable fallback model, so single-provider outages don’t surface as customer-facing failures.
2026-03-24
Qwen3, Qwen3 Next, Kimi K2.5 # — Three new high-capability models added to the catalog.
2026-02-19
Claude Sonnet 4.6 # — Anthropic’s best price-performance frontier model at release.
2026-02-09
Claude Opus 4.5 and 4.6 # — Anthropic’s most capable models, available through the Gateway on day one.

Open Benchmarks #

Transparent, reproducible benchmarks across Universal and community models.

Upcoming

Public Benchmark Leaderboard #Q2 2026

A public version of our internal evaluation dashboard, covering 30+ competitors and the community models we serve across dozens of metrics. Verify accuracy and pick the right model per language and domain.
Open-Source Benchmark Stack #Q3 2026

Open-source release of the dashboard and tooling, so anyone can reproduce our benchmarks on their own data.
Open-Source Benchmark Dataset #Q4 2026

A realistic Voice-AI evaluation set of telephony, meeting, and voice-agent audio with ground-truth transcripts, used to grade every model on the leaderboard. Not a thirty-second YouTube clip or LibriSpeech.

Developer Experience #

The dashboard, accounts, and tooling that make AssemblyAI easy to adopt.

Upcoming

Single Sign-On (SSO) #Q2 2026

SAML and OIDC, as a fast-follow to multi-user accounts.
Self-Service Onboarding Wizard #Q2 2026

Guided setup for new accounts: model selection, API key creation, a first-request walkthrough, and best-practice defaults.
Enhanced Billing Controls #Q3 2026

Hard spend caps and team budgets, beyond today’s soft alerts.
Enhanced API Metrics #Q3 2026

Deeper dashboard observability: P50 and P95 turnaround time, webhook delivery stats, uptime, and latency histograms.
Usage and Spend API #Q4 2026

Programmatic access to billing and usage data.
Enhanced Usage Alerts #Q4 2026

Configurable alert thresholds and cadences, including daily alerts and custom triggers.
3D Secure Payments #Q4 2026

Card support for EU PSD2 Strong Customer Authentication.

Recently shipped

2026-05-12
Multi-User Accounts # — Invite teammates with role-based access. GA, with RBAC, MFA enforcement, member management, account switching, and ownership transfer.
2026-03-17
AssemblyAI Skill for AI Coding Agents # — Claude Code, Cursor, and Codex now ship with a native AssemblyAI skill, giving them accurate API knowledge and cutting hallucinated API usage in generated code.
2026-02-26
Shareable Playground Transcripts # — One-click shareable links to Playground output, easy to show off or hand off for review.