October 22, 2025

Introducing new products and model updates to help you build, deploy, and scale Voice AI applications

Today, we’re introducing a powerful new suite of products and model updates to the AssemblyAI platform that help you build, deploy, and scale your Voice AI applications.

Madison Bernstein

Product Marketing

Speech-to-Text

Reviewed by

Table of contents

[Visible on live site]

Today, we’re introducing a powerful new suite of products and model updates to the AssemblyAI platform that help you build, deploy, and scale your Voice AI applications:

New products:

Speech Understanding: Advanced capabilities from speaker identification to translation
LLM Gateway: One API for your entire voice-to-intelligence pipeline

Product Improvements:

Universal enhancements: 99 languages, automatic code-switching, and improved speaker diarization with 64% fewer speaker counting errors
Slam updates: Industry-leading accuracy improvements for critical terms and formatting
Guardrails: Comprehensive protection to ensure only high-quality, safe, and compliant content flows through your applications.

These updates eliminate the complexity between raw audio and production-ready Voice AI applications, allowing you to build powerful products and get them to market faster.

The industry's best products are built on the industry's best models

Voice AI has evolved over the last year from experimental features to critical infrastructure powering today’s top products. Voice agents are handling customer calls. Meeting intelligence is driving business decisions. Medical scribes are transforming doctors’ workflows. But most teams still spend their time stitching together basic functionality like language detection, speaker identification, and LLM integration, instead of shipping differentiated products.

AssemblyAI’s platform delivers the highest-quality foundation so you can build exceptional Voice AI applications. Best-in-class speech-to-text accuracy meets powerful capabilities that transform raw speech into structured, intelligent data.

Three new products for production Voice AI

1. Speech Understanding: Beyond basic speech-to-text

Convert speech into structured, ready-to-use data. These purpose-built, LLM-powered features turn transcripts into intelligence instantly, giving you intelligent outputs out of the box.

Advanced Capabilities

Advanced speaker identification: Label speakers by role or by name
Custom formatting: Define domain-specific formatting for your industry
Translation: Receive transcripts in your preferred language across 89 supported languages

Built for teams who want smarter outputs rather than more data. No extra integrations, just immediate understanding.

2. Guardrails: Production-ready safety

Guardrails give you comprehensive protection at every stage of your voice AI pipeline including validating inputs and filtering outputs. Guardrails ensure that only high-quality, safe, compliant content flows through your applications.

Safety Guardrails

Profanity filtering: Automatically detect and remove inappropriate language
Content moderation: Block sensitive or harmful content before it reaches downstream systems
PII redaction: Remove personal information from text and audio outputs, available in 50+ languages

Operational Guardrails

Speech Threshold: Transcribe only files with a minimum percentage of spoken audio.
Set start and end of transcript: Limit transcription to sections containing speech.

With Guardrails, you can ensure that you’re stopping unsafe content immediately, and keep it from impacting your application or end users.

3. LLM Gateway: From audio to insights in one platform

With LLM Gateway, you can consolidate your Voice AI application stack across speech-to-text, speech understanding, and LLM insights into one platform. LLM gateway lets you test and validate workflows, including summarization, sentiment analysis, redaction, and more, without juggling vendors or managing separate integrations.

LLM Gateway gives you a unified Voice AI Platform

LLM-compatible API for transcript-to-intelligence tasks like summarization, insight extraction, and more
Route requests to leading LLMs including GPT, OpenAI, Gemini, and others directly from the AssemblyAI platform
Integrated prompting on transcripts, allowing you to go straight from audio to insights without copying data between tools
Unified billing and usage management across providers

Many teams spend valuable engineering time maintaining complex post-processing pipelines to summarize calls, extract insights, or classify sentiment. LLM Gateway removes that complexity and lets developers focus on building great products for their customers.

Enhanced foundation: Speech-to-text model improvements

Universal-2: The proven foundation

Universal-2 continues to set the standard for global, high-quality speech-to-text with significant improvements:

Highlights

99 languages with automatic language detection
Automatic code-switching between English and other languages
64% reduction in speaker counting errors for mid- to long-duration audio files (longer than 2 minutes)
200-word key term support
Works across applications as a general-purpose speech-to-text model
Starting at $0.15/hour

The quality speaks for itself:

Slam: When precision matters

Our Slam (Speech Language AI Model), currently in beta, continues to evolve with industry-leading accuracy improvements for the most demanding use cases. Since the beta launch, we've enhanced speaker diarization, added multi-channel processing with improved timestamp prediction, and significantly reduced hallucinations.

Highlights

Up to 57% accuracy gains across alphanumerics, emails, addresses, financial statements, and organization names
1,000-word context-aware key term prompting to improve domain expertise up to 45% for precision-critical applications
Built for human conversation analysis matching Universal's WER and speaker count accuracy while delivering superior formatting
Starting at $0.27/hour

API update: Intelligent model fallback

To simplify model selection, we’re updating our API interface so customers can specify multiple models in priority order. Set Slam as your default for maximum precision, and the API will automatically fallback to Universal when you need broader language coverage.

Real-world results

AssemblyAI powers Voice AI for platforms that demand global reach, production accuracy, and zero-friction deployment. Our latest improvements represent our continued commitment to delivering on these critical needs.

When Calabrio faced growing dissatisfaction with transcription quality, switching to AssemblyAI delivered immediate results:

80% boost in customer satisfaction
22% increase in revenue
63% improvement in developer productivity

At Siro, the team heard "wow, that insight was crisp" on 10 out of 10 onboarding calls and saw a 90% reduction in support tickets after switching to Assembly. At Delphi, 50% time savings in clone training freed the team to ship impactful features instead of getting stuck in the weeds. No matter the use case, the quality and experience improvements are felt immediately as soon as customers start building on AssemblyAI.

What makes this possible? Industry-leading accuracy

We benchmark our models on hundreds of hours of real-world audio, including customer calls, medical dictations, live meetings, and accented speech, to make sure we can confidently say that we lead the industry in accuracy.

Build, deploy, and scale Voice AI apps faster than ever

AI-native companies are redefining how products are built and shipped, and speed to value has never mattered more. AssemblyAI customers go from first test to production in days, not months. Integrations are described as “plug-and-play” and “frictionless,” with teams reporting 63% less error-handling code after switching.

That means faster proofs of concept, quicker launches, and measurable business impact within weeks, not quarters. But speed alone isn’t enough. As Mike Adams from Grain said, “Everything comes down to garbage in, garbage out. The quality of your speech-to-text directly impacts the quality of your analysis.”

With our new products, you don’t have to choose between accuracy and agility. Together, they form the foundation for quality Voice AI. Enabling faster deployment, higher precision, and greater confidence that what your product hears is exactly what was said.

Three ways to get started

Our new products and model improvements are available now through our API. Current Universal customers already have access to all features.

Three ways to get started:

Try the new products: LLM Gateway, Voice AI Guardrails, and Speech Understanding are ready to use. Universal-2 continues as our default model with all the latest improvements.
Read the Docs: Learn more about new features like Keyterms Prompting, Guardrails, and Speech Understanding in our updated API documentation.
Try the Playground: Experience all three new products and model improvements with your own audio directly in our no-code Playground.

Start building now

Start building with Universal, LLM Gateway, and Speech Understanding to create smarter, faster, and more powerful Voice AI Apps.

Get started now

‍

Introducing new products and model updates to help you build, deploy, and scale Voice AI applications

The industry's best products are built on the industry's best models

Three new products for production Voice AI