Changelog
Follow along to see weekly accuracy and product improvements.
Gemini 3 Pro now available in LLM Gateway
Google's latest Gemini 3 Pro model is now available through AssemblyAI's LLM Gateway, giving you access to one of the most advanced multimodal models with the same unified API you use for all your other providers.
With AssemblyAI's LLM Gateway, you can now test Gemini 3 Pro against models from OpenAI, Anthropic, Google, and others without changing your integration—just swap the model parameter and compare responses, latency, and cost across providers in real-time.
Available now for all LLM Gateway Users
To get started, simply update the "model" parameter in your LLM Gateway request to "gemini-3-pro-preview":
import requests
headers = {
"authorization": "<YOUR_API_KEY>"
}
response = requests.post(
"https://llm-gateway.assemblyai.com/v1/chat/completions",
headers = headers,
json = {
"model": "gemini-3-pro-preview",
"messages": [
{"role": "user", "content": "What is the capital of France?"}
],
"max_tokens": 1000
}
)
result = response.json()
print(result["choices"][0]["message"]["content"])
AssemblyAI's LLM Gateway gives you a single API to access 15+ LLMs from every major provider, with built-in fallbacks, load balancing, and cost tracking. Compare models, optimize for performance or price, and switch providers instantly, all without rewriting code.
View our docs and try Gemini 3 Pro in LLM Gateway.
AssemblyAI Streaming Updates: Multi-Region Infrastructure, Session Controls, and Self-Hosted License Management
Self-Hosted Streaming v0.20: License Management Now Available
Self-Hosted Streaming v0.20 now includes built-in license generation and validation, giving enterprises complete control over deployment security and usage tracking. Organizations can manage their speech AI infrastructure with the same compliance controls they expect from enterprise software.
The new licensing system enables IT teams to track deployment usage, enforce security policies, and maintain audit trails—critical for regulated industries like healthcare and financial services. License validation happens at startup and can be configured for periodic checks to ensure continuous compliance.
Available now for all AssemblyAI Self-Hosted Streaming customers.
Contact your account team to generate licenses for your deployments.
Multi-Region Streaming: US-East-1 Now Live
AssemblyAI's Streaming API is now available in us-east-1, providing regional redundancy and expanded compute capacity for production workloads. The infrastructure update reduces single-region dependency and prepares the platform for upcoming EU deployment.
Multi-region availability means contact centers and live captioning applications can maintain service continuity during regional incidents while accessing additional compute capacity for peak usage periods. The architecture changes also enable faster rollout of new regions based on customer demand.
Available immediately across all AssemblyAI's Streaming API plans. Traffic is automatically routed to the optimal region based on latency and capacity.
Try AssemblyAI’s Streaming API now or view regional availability.
Inactivity Timeout Controls for Streaming Sessions
AssemlyAI’s Streaming API now supports configurable inactivity_timeout parameters, giving developers precise control over session duration management. Applications can extend timeout periods for long-running sessions or reduce them to optimize connection costs.
The feature enables voice agents and live transcription systems to automatically close idle connections without manual intervention. Contact centers can reduce costs on silent periods while ensuring active calls stay connected. Voice agent developers can keep sessions open longer during natural conversation pauses without manual keep-alive logic.
Available now for all AssemblyAI Streaming customers. Set the inactivity_timeout parameter (in seconds) when initializing your connection.
Implementation:
- Set
inactivity_timeoutin your connection parameters - Values range from 5 to 3600 seconds
- Default timeout remains 30 seconds if not specified
- Available across all pricing tiers
View our documentation to learn more.
Streaming Model Update: Enhanced Performance & New Capabilities
We've released a new version of our English Streaming model with significant improvements across the board.
Performance gains:
- 88% better accuracy on short utterances and repeating commands/numbers
- 12% faster emission latency
- 7% faster time to complete transcript
- 4% better accuracy on prompted keyterms
- 3% better accuracy on accented speech
New features:
- Language detection for utterances (Multilingual model only) – Get language output for each utterance to feed downstream processes like LLMs
- Dynamic keyterms prompting – Update your keyterms list mid-stream to improve accuracy on context you discover during the conversation
LeMUR Deprecation
LeMUR will be deprecated on March 31, 2026 and will no longer work after this date.
Users will need to migrate to LLM Gateway by that date for continued access to language model capabilities and benefit from an expanded model selection as well as better performance.
For more information, check out our documentation on LLM Gateway and our LeMUR to LLM Gateway migration guide.
Universal Multilingual Streaming
We've launched Universal Multilingual Streaming, enabling low-latency transcription across multiple languages without compromising accuracy.
Key Features
- 6 languages: English, Spanish, French, German, Italian, Portuguese
- Industry-leading accuracy: 11.77% average WER across real-world audio
- $0.15/hour flat: No language surcharges
- Production-ready: Punctuation, capitalization, and intelligent endpointing included
Use Cases
- Voice agents serving international customers
- Multi-language contact centers
- Cross-border meeting transcription
- Healthcare systems serving diverse communities
No complex language detection pipelines or multiple vendor integrations required. See our documentation for implementation details.
Deprecation of V2 Streaming
Our legacy streaming endpoint (/v2/realtime/ws) will be deprecated on January 31, 2026, and will no longer work after this date.
Users will need to migrate to Universal-Streaming before the deadline to avoid service interruption.
Why upgrade?
- Higher accuracy
- Lower latency
- Intelligent endpointing
- Multilingual support
- Lower pricing ($0.15/hour)
For more information, check out our documentation on Universal-Streaming and our V2 to V3 migration guide.
Claude 3.5 & 3.7 Sonnet Sunset
As previously announced, we will be sunsetting Claude 3.5 Sonnet and 3.7 Sonnet for LeMUR on October 29th. After this date, requests made using Claude 3.5 and 3.7 Sonnet will return errors.
If you are using this model, we recommend switching to Claude 4 Sonnet, which is more performant than Claude 3.5 and 3.7 Sonnet. You can switch models by setting the final_model parameter to anthropic/claude-sonnet-4-20250514
New Voice AI Tools and Model Updates
Introducing new tools and model updates to help you build, deploy, and scale Voice AI applications:
Speech Understanding: Advanced speaker identification, custom formatting rules, and translation let you transform raw transcripts into structured data instantly
LLM Gateway: One API for your entire voice-to-intelligence pipeline with integrated access to GPT, Claude, Gemini, and others.
Voice AI Guardrails: PII redaction in 50+ languages, profanity filtering, and content moderation.
Model Enhancements:
- Automatically code-switch between 99 languages, with 64% fewer speaker counting errors
- Up to 57% accuracy improvements on critical terms with 1,000-word context-aware prompting
Read more about these tools in our blog and check out our documentation for more information.
Speaker Diarization Update
We've shipped significant improvements to speaker count accuracy on Universal and Slam-1:
- 36% more accurate speaker detection for pre-recorded files
- Fewer false positives and missed speakers - the model now consistently identifies the correct number of participants
Slam-1 bugfixes
Fix released to address hallucinations occasionally produced in Slam-1 transcriptions.
Slam-1 Timestamps:
- Fixed issue where sentences separated by silence were sometimes incorrectly combined due to inaccurate timestamps.
- Fix released to reduce occasional timestamp inconsistencies.
Universal-Streaming Improvements
We've released updates to our Universal-Streaming model, bringing significant performance improvements across the board.
What's better:
- Overall accuracy: 3% improvement in general transcription accuracy
- Accented speech: 4% better recognition for speakers with accents
- Conversation Intelligence segments: 4% improvement in conversation intelligence use cases
- Proper nouns: 7% better at recognizing names, brands, and places
- Repeated words: 21% improvement when speakers repeat themselves
- Speed: 20ms faster response time for even lower latency
- Keyterms Prompting: Up to 66% better recognition of your custom terms
Keyterms Prompt for Universal (Beta) and PII Redaction Updates; bugfix
The keyterms_prompt parameter can now be used with Universal for pre-recorded audio transcription, ensuring accurate recognition of product names, people, and industry terms. This feature is in Beta and only available for English files. For more information, please refer to our documentation.
PII Audio Redaction is now available for files processed via the EU endpoint.
PII Redaction now supports additional languages: Afrikaans, Bengali, and Thai.
Fixed issue where occasionally Slam-1 incorrectly inserted new lines in transcripts.
Playground Updates; bugfixes
LeMUR Integration: LeMUR is now available via the Playground, enabling enhanced language understanding and processing capabilities
Account-Based Playground: The Playground is now attached to individual accounts, allowing users to track their transcription history

Speaker Diarization: Fixed occasional errors when using speaker diarization with non-English languages
Slam-1:
- Resolved errors caused by multiple consecutive punctuation symbols (e.g., '??' or '!!')
- Fixed timestamp adjustments that were causing shifts in word ordering
- Reduced hallucinations in transcript text output
Text Formatting: Released a fix that mitigates occasional punctuation and casing inconsistencies in transcriptions
Keyterms Prompting for Universal-Streaming
Voice AI finally understands the words that matter most to your business - product names, people, industry terms - with perfect accuracy in real-time.
The impact:
- 21% better accuracy than leading alternatives
- 67% lower cost ($0.04/hour)
- No impact on streaming latency
Who wins: Restaurant ordering bots that never mishear menu items. Medical schedulers that get doctor names right. Meeting tools with searchable, accurate transcripts.
Include a maximum of 100 keyterms per session. For more information about this new feature and implementation, please refer to our blog and documentation.
Universal Language Expansion
Universal now delivers production-ready accuracy and features across 99 languages through a single, unified endpoint.
What's new:
- Expanded language detection – Automatically detects all 99 languages (up from 17)
- Global speaker diarization – Identify speakers in 95 languages with precision
- Superior performance – Experience 2-3x faster processing for languages like Spanish, French, and German
- Customizable language detection – Set expected languages and fallback options tailored to your specific use case
Enable comprehensive language detection with just one parameter and no complex integration required. Check out our blog and documentation to explore Universal's capabilities.
Streaming Update; bugfix
Added Voice Activity Detection (VAD) to our endpointing model for more accurate detection of ongoing speech. Interruptions are reduced by nearly 100%, while still accurately predicting user end of turns. This feature is now natively integrated into the model and works automatically so no setup is required.
Fixed a bug where using Slam-1 with speaker diarization occasionally resulted in a server error.
Dashboard Region Rates Toggle
Added a toggle on the dashboard under the Billing tab in the Account section to switch the view between US and EU rates.

Universal-Streaming Accuracy Improvement
Our Universal-Streaming model has been updated with improved accuracy features.
What's New:
- 52% improvement in handling repeated digits and tokens - The model now captures repetitions like "555-5555" or "yes, yes, confirmed" much more accurately (error rate reduced from 28.20% to 13.47%)
This enhancement delivers significant improvements for voice agents processing phone numbers, confirmation codes, and account numbers, with particular value for AI receptionists, drive-thru ordering systems, and customer support applications.
Claude 3 Sonnet Sunset & Speaker Diarization Improvement; bugfix
As previously announced, we have sunset Claude 3 Sonnet for LeMUR on July 21st.
If you were using this model, we recommend switching to Claude 4 Sonnet, which is more performant than Claude 3. You can switch models via the final_model parameter in LeMUR requests.
Released an update to our speaker diarization model so that it performs better in telephony conversations.
Fixed a bug where the min_speakers_expected and max_speakers_expected parameters in speaker_options were not being properly applied when the audio file length was shorter than two minutes.
Formatting Updates for Spanish & German
We've upgraded Universal with advanced text formatting specifically for Spanish and German:
- Spanish: Automatic inverted question marks (¿) and exclamation points (¡)
- German: Proper noun capitalization following grammar rules
- Both: Context-aware punctuation and natural number formatting
Native speakers now prefer Universal's formatting 62.2% of the time for Spanish and 54.5% for German. For more information about results and metrics, check out our blog.