Pre-recorded Speech-to-Text API #

Upcoming

Recently shipped

  • Universal 3 Pro Async Timestamp Improvements #Better Universal-3 Pro timestamps: median precision up 15.3% for English and 8.6% for non-English, with P99 gains of 15.0% and 58.4%.
  • Hebrew & Swedish #Accuracy gains in Hebrew and Swedish via community models. Word error rates down 37% and 47%.
  • Medical Mode #LLM-powered correction for medical terminology: 4.97% error rate versus 7.32% for the next-best vendor. Add-on to Universal-3 Pro in English, Spanish, German, French, Portuguese, and Italian.
  • PII Audio Redaction using Silence #Redact PII with silence instead of a beep, reducing listener fatigue when redacted audio is replayed at scale.
  • Universal 3 Pro Async #Promptable speech-to-text with natural-language and custom-vocabulary prompts, mid-sentence language switching across six core languages, and audio tagging.
  • Improved Short-Audio Diarization #19% better speaker-count accuracy and 6% lower speaker-attributed word error rate on audio under two minutes.
  • Multichannel Diarization #Per-channel speaker labels for multi-microphone recordings, eliminating crosstalk ambiguity in call-center and meeting audio.

Realtime Speech-to-Text API #

Upcoming

Recently shipped

  • Voice Focus #Realtime noise suppression for voice agents and telephony, so accuracy holds up in real call-center conditions with no separate preprocessor.
  • Streaming Modes #min_latency, balanced, and max_accuracy presets to tune the latency/accuracy trade-off per workload.
  • Context Carryover #Universal-3 Pro Streaming carries prior finalized turns forward as context to improve accuracy, on by default. Optionally pass your voice agent’s spoken reply via agent_context so the model knows the question the user is answering.
  • Streaming Speaker Revision #An end-of-stream SpeakerRevision message returns corrected speaker labels at async-parity cpWER, for roughly 400ms of added latency.
  • Streaming PII Redaction #PII detection and redaction in the realtime pipeline for HIPAA, PCI, and similar workloads. Configurable entity types and substitution modes.
  • Medical Mode #LLM-powered correction for medical terminology: 4.97% error rate versus 7.32% for the next-best vendor. Async and streaming on universal-realtime-3-pro.
  • Streaming Diarization v1.5 #Speaker-aware sentence splitting: 4-5% lower word error rate, 56% fewer phantom speakers, and gains on the CallHome and AMI benchmarks.
  • Universal 3 Pro Realtime #Realtime speech-to-text with inline speaker labeling, custom vocabulary up to 1,000 words, audio tagging, filler-word control, mid-sentence language switching, and 99+ languages via Whisper routing. EU region support.
  • Edge Routing and Data Zone Endpoints #Global low-latency routing with US/EU data-residency endpoints. No additional charge.

Voice Agent API #

Upcoming

Recently shipped

  • Voice Agent API #Production release of the Voice Agent API (formerly Speech-to-Speech API), built on universal-realtime-3-pro, LLM Gateway, and TTS on self-hosted LiveKit. PCI-certified.
  • Voice Agent Preview #First public release of end-to-end voice AI, combining universal-realtime-3-pro, LLM Gateway, and TTS on LiveKit.

TTS #

Upcoming

Speech Understanding API #

Upcoming

LLM Gateway #

Upcoming

Recently shipped

  • Global Routing #An opt-in model_region: global setting that routes to lower-cost capacity for Claude calls. Gemini 3 series coming soon.
  • Claude Opus 4.8, Gemini 3.5 Flash, Gemini 3.1 Flash Lite (GA) #Three new models added to the catalog, available through the Gateway on day one.
  • Reasoning Mode #Reasoning and effort controls exposed through the Gateway for OpenAI-compatible, Gemini 3+, and Anthropic models.
  • Prompt Caching #Prompt-cache pass-through, so customers keep the cache discount while routing through the Gateway.
  • Claude Opus 4.7 #Anthropic’s most capable model, available through the Gateway on day one.
  • Automatic Model Fallbacks #The Gateway retries failed requests against a configurable fallback model, so single-provider outages don’t surface as customer-facing failures.
  • Qwen3, Qwen3 Next, Kimi K2.5 #Three new high-capability models added to the catalog.
  • Claude Sonnet 4.6 #Anthropic’s best price-performance frontier model at release.
  • Claude Opus 4.5 and 4.6 #Anthropic’s most capable models, available through the Gateway on day one.

Open Benchmarks #

Upcoming

Developer Experience #

Upcoming

Recently shipped

  • Multi-User Accounts #Invite teammates with role-based access. GA, with RBAC, MFA enforcement, member management, account switching, and ownership transfer.
  • AssemblyAI Skill for AI Coding Agents #Claude Code, Cursor, and Codex now ship with a native AssemblyAI skill, giving them accurate API knowledge and cutting hallucinated API usage in generated code.
  • Shareable Playground Transcripts #One-click shareable links to Playground output, easy to show off or hand off for review.