Changelog
Follow along to see weekly accuracy and product improvements.
Universal-Streaming Accuracy Improvement
Our Universal-Streaming model has been updated with improved accuracy features.
What's New:
- 52% improvement in handling repeated digits and tokens - The model now captures repetitions like "555-5555" or "yes, yes, confirmed" much more accurately (error rate reduced from 28.20% to 13.47%)
This enhancement delivers significant improvements for voice agents processing phone numbers, confirmation codes, and account numbers, with particular value for AI receptionists, drive-thru ordering systems, and customer support applications.
Claude 3 Sonnet Sunset & Speaker Diarization Improvement; bugfix
As previously announced, we have sunset Claude 3 Sonnet for LeMUR on July 21st.
If you were using this model, we recommend switching to Claude 4 Sonnet, which is more performant than Claude 3. You can switch models via the final_model
parameter in LeMUR requests.
Released an update to our speaker diarization model so that it performs better in telephony conversations.
Fixed a bug where the min_speakers_expected
and max_speakers_expected
parameters in speaker_options
were not being properly applied when the audio file length was shorter than two minutes.
Formatting Updates for Spanish & German
We've upgraded Universal with advanced text formatting specifically for Spanish and German:
- Spanish: Automatic inverted question marks (¿) and exclamation points (¡)
- German: Proper noun capitalization following grammar rules
- Both: Context-aware punctuation and natural number formatting
Native speakers now prefer Universal's formatting 62.2% of the time for Spanish and 54.5% for German. For more information about results and metrics, check out our blog.
Expanded PII Audio Redaction Language Support; bugfixes
PII Audio Redaction is now supported for all languages that support PII Text Redaction (previously, only English and Spanish were supported). Refer to our documentation to see all languages and their supported features.
Fixed an edge case issue that could sometimes result in overlapping timestamps in transcripts with formatted numbers.
Fixed an issue with the /sentences
endpoint where sentences were being created at periods used in abbreviations like “Dr.” or “Mrs.”.
Fixed an issue where the min_speakers_expected
value was sometimes not properly applied to the speaker_options
parameter.
Implemented an enhanced hallucination filter that mitigates prompt injection issues with Slam-1.
Speaker Diarization Model Update
Released new in-house speaker embedding model delivering significant improvements for challenging audio environments while maintaining performance on clean recordings. This enhanced model provides more accurate meeting transcripts, reliable call center analytics, and consistent speaker identification in conference rooms, remote meetings, and multi-speaker interviews.
Key Improvements
- Noisy & Far-Field Scenarios: Error rates dropped from 29.1% to 20.4% - a 30% improvement for challenging acoustic environments where traditional systems fail.
- Short Audio Segments: 43% improvement in very short segments (250ms) under noisy conditions - now accurately tracking single words and brief acknowledgments.
- Multi-Speaker Robustness: Complex audio with multiple speakers and background noise that previously collapsed to a single speaker is now accurately separated.
This model is automatically active for all customers and no action required to benefit from improved diarization accuracy. For more information about using speaker diarization, please refer to our documentation.
Claude 4 Models Now Available Through LeMUR
We're excited to announce that Claude 4 Sonnet and Claude 4 Opus are now available through our LeMUR endpoint.
Claude 4 Sonnet delivers enhanced reasoning and improved performance for everyday tasks while maintaining exceptional speed and cost-effectiveness. It's perfect for applications requiring reliable, intelligent responses across a wide range of use cases.
- API Parameter:
final_model: "anthropic/claude-sonnet-4-20250514"
- Availability: US and EU regions
- Pricing: Same as Claude 3.7 Sonnet
- Input: $0.003 per 1k tokens
- Output: $0.015 per 1k tokens
Claude 4 Opus represents our most capable model yet, offering superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving. It excels at nuanced analysis, detailed research, and handling intricate multi-step workflows.
- API Parameter:
final_model: "anthropic/claude-opus-4-20250514"
- Availability: US region only
- Pricing: Same as Claude 3 Opus
- Input: $0.015 per 1k tokens
- Output: $0.075 per 1k tokens
To use Claude 4, update the final_model
parameter in existing LeMUR API calls. For more information and implementation guidance, check out our documentation.
Expanded Speaker Limit for Speaker Diarization
Added an optional `speaker_options` parameter that allows the user to specify a range for the number of possible speakers in audio files. This enhancement provides greater flexibility for processing audio with varying speaker counts, particularly files that contain more than 10 speakers. Refer to our documentation for more information.
Slam-1 and LeMUR Now Available in the EU
Slam-1 and LeMUR are now available through our EU API endpoint, providing complete data residency compliance for European customers.
Slam-1 in the EU delivers the same industry-leading speech recognition accuracy with complete EU data residency. Audio data remains within EU boundaries while maintaining the same advanced capabilities and seamless API integration.
LeMUR in the EU brings powerful audio intelligence to EU customers with GDPR compliance, including audio summarization, Q&A capabilities, action item extraction, and support for Claude 3 Haiku, Claude 3.5 Sonnet, and Claude 3.7 Sonnet models.
Check out our documentation for more information about the EU API endpoint as well as Slam-1 and LeMUR.
Update for Audio Redaction
When requesting audio redaction, there is now an option that allows users to receive back audio files even if they do not contain any redacted audio. For more information, please consult our documentation.
Playground Update
The AssemblyAI Playground now has a redesigned interface that enables users to test our new Slam-1 model and the existing Universal model for pre-recorded audio, as well as our new Universal-Streaming model for real-time transcription. Users can now access the entire range of AssemblyAI model capabilities through a code-free interface, from basic transcription to advanced features like key term prompting, speaker diarization, sentiment analysis, and custom vocabulary.

Introducing Universal-Streaming
Universal-Streaming is our new speech-to-text (STT) model 🚀

What's Improved:
- Ultra-low latency with immutable transcripts - Universal-Streaming delivers ~300ms word emission with 41% faster median latency than Deepgram Nova-3, provides immutable final transcripts from the start to enable real-time agent processing, and offers latency-tunable features like the ability to toggle punctuation for maximum speed.
- Intelligent endpointing for smoother turn detection - Our end-of-turn model enhances speed and accuracy, supporting natural pauses without premature interruptions for smoother conversations.
- Accuracy on the tokens that matter - Universal-Streaming delivers substantial improvements in these challenging areas: 21% fewer alphanumeric errors on emails and codes, 28% improvement on consecutive numbers, and 5% better proper noun recognition. These improvements ensure fewer correction loops and silent transcription errors.
- Transparent pricing with unlimited concurrency - Pricing starts at $0.15/hr with volume discounts available for larger implementations. Scale confidently with unlimited concurrent streams with no hard caps or over-stream surcharges.
Learn more about Universal-Streaming in our blog and review our comprehensive Getting Started Guide for detailed implementation information.
Slam-1 bugfix
We’ve fixed a bug on Slam-1 where users' keyterms_prompt
value was occasionally appearing in the transcript text.
Error Message Improvement
Optimized error message for instances where the region used to upload a file via the /upload
endpoint does not match the region being used to transcribe that URL.
Enhanced Account Security
We've added Email Verification and Google OAuth:
- Google authentication users: If your account email is a Gmail address, you can simply click 'Continue with Google' for instant access, followed by account verification - no additional linking is needed.
- Email/password users: On your first login after this update, you'll receive a one-time link to reset your password. Simply click the link to reset your new password and access your dashboard.
New LeMUR Models
We've expanded LeMUR capabilities with two powerful new models:
- Claude 3.7 Sonnet - The most intelligent model to date, featuring enhanced reasoning capabilities for complex audio analysis tasks.
- Claude 3.5 Haiku - The fastest model, optimized for quick responses while maintaining excellent reasoning abilities.
Whether you're analyzing customer calls, generating meeting summaries, or performing audio content analysis, these models deliver significant improvements.
You can begin using these new models right away with your existing LeMUR implementation. For detailed instructions on integration, model parameters, and code examples across all supported programming languages, check out our docs.
🚀 Slam-1 Public Beta 🚀
Slam-1, our new customizable Speech Language Model, is now available in public beta!
Slam-1 combines large language model reasoning with specialized audio processing to understand speech rather than just recognize it. This multi-modal architecture enables new levels of accuracy, adaptability, and control over speech transcription with high-demand features including speaker diarization, timestamp prediction, and multichannel transcription, and can be used as a drop-in replacement to improve the accuracy of existing models.The standout capability of Slam-1 is its ability to be fine-tuned for specific contexts without model retraining or complex engineering, adapting to capture the terminology and nuances across various fields from healthcare to legal proceedings.
Performance Highlights:
- 66% of human evaluators consistently preferred Slam-1 transcripts over our current Universal model and 72% of users preferred Slam-1 transcripts in blind tests over Deepgram’s Nova-3 model
%20Blog%20-%20Slam-1-min.png)
- 20% reduction in formatting errors
- Up to 66% reduction in missed entities (names, places, custom terms) with customization
%20Blog%20-%20Slam-1-min.png)
Refer to our documentation for information about getting started and check out our blog post to learn more about Slam-1.
Dashboard Updates; Scaling Optimization; LeMUR bugfix
Introducing Dark Mode for our dashboard! Users can now switch between light and dark mode via a toggle in the top navigation bar.

Optimized scaling and capacity provisioning to more efficiently handle customer traffic.
Reduced LeMUR errors with targeted improvements to help alleviate scaling issues.
AssemblyAI is now PCI DSS v4.0 Compliant; bugfix
We've upgraded our PCI compliance to PCI DSS v4.0, ensuring our Speech-to-Text API meets the latest payment card industry security standards.
Added additional retry logic to reduce edge case authentication errors that would sometimes occur.
Dashboard Revamp
We have upgraded our dashboard—now with enhanced analytics and improved navigation to help you get more out of your AssemblyAI account.

The new dashboard features:
- Modern UI with improved navigation and streamlined onboarding
- Enhanced analytics with usage and model-specific filtering
- Advanced transcription history with filtering by date, ID, project, and API key
- Dedicated rate limits section showing your account's limits for all endpoints
- Clearer billing information with improved plan details and usage visualization
Our multiple API keys feature is fully integrated with the new dashboard, allowing you to better organize projects, and enhance security.
Log in to your AssemblyAI account today to experience the improved interface.
Speaker Labels bugfix
Reduced edge case errors with the Speaker Labels feature that could sometimes occur when the final utterance was a single word.