Changelog

Latest product updates

Get the latest product updates in your inbox.

Gemini 3.5 Flash now supported on LLM Gateway

Google's Gemini 3.5 Flash is now available through LLM Gateway. Flash is Google's fast, cost-efficient model in the Gemini 3 family — built for high-throughput workloads where latency and price-per-token matter as much…

New LLM Gateway

Streaming PII Redaction

PII Redaction is now available for Streaming Speech-to-Text. Set redact_pii: true on a streaming connection to automatically detect and remove sensitive information — names, phone numbers, email addresses, payment…

New Real-time Speech-to-Text

Streaming Speaker Diarization: Major Accuracy Upgrade with Per-Word Labels

We've shipped a major upgrade to streaming speaker diarization, with significant accuracy gains and a refined API that delivers per-word speaker labels…

Improvement Real-time Speech-to-Text Pre-recorded Speech-to-Text

LLM Gateway: JSON Repair Post-Processing for Structured Output

LLM Gateway completions now support a post-processing pipeline, and the first step available is json-repair — an optional pass that automatically fixes malformed JSON returned by a model before it reaches your application…

New LLM Gateway

Introducing the Voice Agent API

The Voice Agent API is now available — a complete voice agent pipeline built on AssemblyAI's own models, delivered through a single WebSocket…

New Voice Agent API

PII Redaction: Return Unredacted Transcripts in the Same Request

You can now retrieve both the redacted and unredacted versions of a transcript in a single PII Redaction request. Set the new redact_pii_return_unredacted flag to true in your POST /v2/transcript body, and the response…

New Speech Understanding

Claude Opus 4.7 Now Available on LLM Gateway

Claude Opus 4.7 is now available through LLM Gateway. Opus 4.7 is Anthropic's most intelligent model yet — the latest in the Claude family, pushing the frontier on reasoning, coding, and complex multi-step tasks…

New LLM Gateway

Universal-2 Language Improvements: Hebrew & Swedish

Universal-2 transcription accuracy has improved significantly for Hebrew and Swedish, with word error rates reduced by 37% and 47% respectively…

Improvement Pre-recorded Speech-to-Text

LLM Gateway: Automatic Model Fallbacks

LLM Gateway now supports automatic model fallbacks, giving your application resilience against model failures without changing your integration…

New LLM Gateway

Introducing Medical Mode: Purpose-built accuracy for medical terminology

Medical Mode is a new add-on for AssemblyAI's Streaming Speech-to-Text that improves transcription accuracy for medical terminology — including medication names, procedures, conditions, and dosages…

New Real-time Speech-to-Text Pre-recorded Speech-to-Text

New LLM Gateway Models: Qwen3, Qwen3 Next, & Kimi K2.5

Three new models are now live in LLM Gateway for paid accounts: Qwen3 Next 80B A3B, Qwen3 32B from Alibaba Cloud, and Kimi K2.5 from Moonshot AI…

New LLM Gateway

AssemblyAI Skill for AI Coding Agents

The AssemblyAI Skill is now available for AI coding agents — giving Claude Code, Cursor, Codex, and other vibe-coding tools accurate, up-to-date knowledge of AssemblyAI's APIs, SDKs, and integrations out of the box…

New LLM Gateway Speech Understanding Pre-recorded Speech-to-Text

PII Audio Redaction: Silence or Beep

You can now control how PII is replaced in redacted audio. By default, AssemblyAI substitutes PII with a beep tone — now you can switch that to silence instead…

New Speech Understanding

Universal-3-Pro Now Available for Streaming

Universal-3-Pro is now available for real-time streaming — bringing our most accurate speech model to live transcription for the first time…

New Real-time Speech-to-Text Pre-recorded Speech-to-Text

Share Your Playground Transcripts

The AssemblyAI Playground now has a share button. One click generates a shareable link to your transcript output that stays live for 90 days…

New Speech Understanding Pre-recorded Speech-to-Text

LLM Gateway Now Available in the EU

LLM Gateway and Speech Understanding are now available in the EU. Customers can run LLM inference directly in the EU region, enabling data residency compliance and opening the door for teams previously blocked by…

New LLM Gateway

Claude Sonnet 4.6 now supported on LLM Gateway

Claude Sonnet 4.6 is now available through LLM Gateway. Sonnet 4.6 is our most capable Sonnet model yet with frontier performance across coding, agents, and professional work at scale. With this model, every line of…

New LLM Gateway

LLM Gateway Streaming: Apply LLMs at Every Turn in Real Time

LLM Gateway is now available in a single streaming API call, letting you apply large language models at the turn level as transcription results flow in real time…

New LLM Gateway Real-time Speech-to-Text

Claude Opus 4.5 and 4.6 now supported on LLM Gateway

Claude's most capable models are now available through LLM Gateway. Opus 4.5 and Opus 4.6 bring significant improvements in reasoning, coding, and instruction-following…

New LLM Gateway

Universal-3-Pro: Our Promptable Speech-to-Text Model

We've released Universal-3-Pro, our most powerful Voice AI model yet—designed to give you LLM-style control over transcription output for the first time…

New Pre-recorded Speech-to-Text

Improved Speaker Diarization for Short Audio

Speaker diarization is now more accurate for audio files under 2 minutes, with a 19% improvement in speaker count prediction and 6% improvement in cpWER…

Improvement Pre-recorded Speech-to-Text

Global Edge Routing & Data Zone Endpoints for Streaming Speech-to-Text

We've launched new streaming endpoints that give you control over latency optimization and data residency. Choose the endpoint that best fits your application's requirements—whether that's achieving the lowest possible…

New Real-time Speech-to-Text Pre-recorded Speech-to-Text

Multichannel Speaker Diarization

We've added support for multichannel speaker diarization with pre-recorded transcription, allowing you to identify individual speakers across multiple audio channels in a single API request…

New Pre-recorded Speech-to-Text

Gemini 3 Flash Preview now supported on LLM Gateway

Google's newest Gemini 3 Flash Preview model is live in the LLM Gateway…

New LLM Gateway

Improved File Deletion for Enhanced Data Privacy

We've updated how uploaded audio files are deleted when you delete a transcript, giving you immediate control over your data…

Improvement Speech Understanding Pre-recorded Speech-to-Text

Transcribe public audio URLs directly in the Playground

Our Playground just got a little more powerful: you can now transcribe audio directly from public URLs…

New Pre-recorded Speech-to-Text

GPT-5.1 & 5.2 now supported on LLM Gateway

OpenAI’s newest GPT-5.1 and GPT-5.2 models are live in the LLM Gateway…

New LLM Gateway

Keyterm Prompting Now Available for Universal-Streaming Multilingual

Keyterm prompting is now in production for multilingual streaming, giving developers the ability to improve accuracy for target words in real-time transcription…

New Real-time Speech-to-Text Pre-recorded Speech-to-Text

Hallucination Rate Reduced for Multilingual Streaming

We've improved hallucination detection and reduction across Universal-Multilingual Streaming transcription, resulting in fewer false outputs while maintaining minimal latency impact…

Improvement Guardrails Real-time Speech-to-Text

Transcription Access Now Scoped to Project Level for Uploaded Files

We've tightened security controls on pre-recorded file transcription by scoping access to uploaded files within the same project that uploaded them…

Improvement Pre-recorded Speech-to-Text

AssemblyAI Streaming Updates: Multi-Region Infrastructure, Session Controls, and Self-Hosted License Management

Self-Hosted Streaming v0.20: License Management Now Available…

New Real-time Speech-to-Text

Gemini 3 Pro now available in LLM Gateway

Google's latest Gemini 3 Pro model is now available through AssemblyAI's LLM Gateway, giving you access to one of the most advanced multimodal models with the same unified API you use for all your other providers…

New LLM Gateway

Streaming Model Update: Enhanced Performance & New Capabilities

We've released a new version of our English Streaming model with significant improvements across the board…

Improvement Real-time Speech-to-Text

LeMUR Deprecation

LeMUR will be deprecated on March 31, 2026 and will no longer work after this date…

Improvement Speech Understanding

Universal Multilingual Streaming

We've launched Universal Multilingual Streaming, enabling low-latency transcription across multiple languages without compromising accuracy…

New Real-time Speech-to-Text

Deprecation of V2 Streaming

Our legacy streaming endpoint ( /v2/realtime/ws ) will be deprecated on January 31, 2026 , and will no longer work after this date…

Improvement Real-time Speech-to-Text

Claude 3.5 & 3.7 Sonnet Sunset

As previously announced, we will be sunsetting Claude 3.5 Sonnet and 3.7 Sonnet for LeMUR on October 29th. After this date, requests made using Claude 3.5 and 3.7 Sonnet will return errors…

Improvement LLM Gateway

New Voice AI Tools and Model Updates

Introducing new tools and model updates to help you build, deploy, and scale Voice AI applications:…

New LLM Gateway Guardrails Speech Understanding

Speaker Diarization Update

We've shipped significant improvements to speaker count accuracy on Universal and Slam-1:…

Improvement Pre-recorded Speech-to-Text

Slam-1 bugfixes

Fix released to address hallucinations occasionally produced in Slam-1 transcriptions…

Fix Pre-recorded Speech-to-Text

Universal-Streaming Improvements

We've released updates to our Universal-Streaming model, bringing significant performance improvements across the board…

Improvement Real-time Speech-to-Text

Keyterms Prompt for Universal (Beta) and PII Redaction Updates; bugfix

The keyterms_prompt parameter can now be used with Universal for pre-recorded audio transcription, ensuring accurate recognition of product names, people, and industry terms…

New Speech Understanding Pre-recorded Speech-to-Text

Playground Updates; bugfixes

LeMUR Integration : LeMUR is now available via the Playground, enabling enhanced language understanding and processing capabilities…

New Speech Understanding Pre-recorded Speech-to-Text

Keyterms Prompting for Universal-Streaming

Voice AI finally understands the words that matter most to your business - product names, people, industry terms - with perfect accuracy in real-time…

New Real-time Speech-to-Text Pre-recorded Speech-to-Text

Universal Language Expansion

Universal now delivers production-ready accuracy and features across 99 languages through a single, unified endpoint…

New Pre-recorded Speech-to-Text

Streaming Update; bugfix

Added Voice Activity Detection (VAD) to our endpointing model for more accurate detection of ongoing speech. Interruptions are reduced by nearly 100%, while still accurately predicting user end of turns. This feature is…

Improvement Real-time Speech-to-Text

Claude 3 Sonnet Sunset & Speaker Diarization Improvement; bugfix

As previously announced, we have sunset Claude 3 Sonnet for LeMUR on July 21st…

Improvement LLM Gateway Pre-recorded Speech-to-Text

Universal-Streaming Accuracy Improvement

Our Universal-Streaming model has been updated with improved accuracy features…

Improvement Real-time Speech-to-Text

Formatting Updates for Spanish & German

We've upgraded Universal with advanced text formatting specifically for Spanish and German:…

Improvement Platform

Expanded PII Audio Redaction Language Support; bugfixes

PII Audio Redaction is now supported for all languages that support PII Text Redaction (previously, only English and Spanish were supported). Refer to our documentation to see all languages and their supported features…

New Speech Understanding

Speaker Diarization Model Update

Released new in-house speaker embedding model delivering significant improvements for challenging audio environments while maintaining performance on clean recordings…

Improvement Pre-recorded Speech-to-Text

Claude 4 Models Now Available Through LeMUR

We're excited to announce that Claude 4 Sonnet and Claude 4 Opus are now available through our LeMUR endpoint…

New LLM Gateway Speech Understanding

Expanded Speaker Limit for Speaker Diarization

Added an optional `speaker_options` parameter that allows the user to specify a range for the number of possible speakers in audio files…

Improvement Pre-recorded Speech-to-Text

Slam-1 and LeMUR Now Available in the EU

Slam-1 and LeMUR are now available through our EU API endpoint, providing complete data residency compliance for European customers…

New Speech Understanding

Update for Audio Redaction

When requesting audio redaction, there is now an option that allows users to receive back audio files even if they do not contain any redacted audio. For more information, please consult our documentation…

New Platform

Playground Update

The AssemblyAI Playground now has a redesigned interface that enables users to test our new Slam-1 model and the existing Universal model for pre-recorded audio, as well as our new Universal-Streaming model for real-time transcription…

Improvement Real-time Speech-to-Text Pre-recorded Speech-to-Text

Introducing Universal-Streaming

Universal-Streaming is our new speech-to-text (STT) model 🚀…

Improvement Real-time Speech-to-Text

Slam-1 bugfix

We’ve fixed a bug on Slam-1 where users' keyterms_prompt value was occasionally appearing in the transcript text…

Fix Platform

Error Message Improvement

Optimized error message for instances where the region used to upload a file via the /upload endpoint does not match the region being used to transcribe that URL…

Improvement Pre-recorded Speech-to-Text

Enhanced Account Security

We've added Email Verification and Google OAuth:…

New Platform

New LeMUR Models

We've expanded LeMUR capabilities with two powerful new models:…

Improvement Speech Understanding

🚀 Slam-1 Public Beta 🚀

Slam-1, our new customizable Speech Language Model, is now available in public beta…

Improvement Pre-recorded Speech-to-Text

Dashboard Updates; Scaling Optimization; LeMUR bugfix

Introducing Dark Mode for our dashboard! Users can now switch between light and dark mode via a toggle in the top navigation bar…

New Speech Understanding

AssemblyAI is now PCI DSS v4.0 Compliant; bugfix

We've upgraded our PCI compliance to PCI DSS v4.0, ensuring our Speech-to-Text API meets the latest payment card industry security standards…

New Pre-recorded Speech-to-Text

Dashboard Revamp

We have upgraded our dashboard—now with enhanced analytics and improved navigation to help you get more out of your AssemblyAI account…

New Pre-recorded Speech-to-Text

Speaker Labels bugfix

Reduced edge case errors with the Speaker Labels feature that could sometimes occur when the final utterance was a single word…

Fix Platform

Multiple API Keys & Projects

We’ve introduced Multiple API Keys and Projects for AssemblyAI accounts. You can now create separate projects for development, staging, and production, making it easier to manage different environments. Within each…

New Platform

Update to List Endpoint

We’ve bifurcated our list endpoint into two separate endpoints - one for data processed on EU servers and one for data processed on US servers. Previously, the list endpoint returned transcripts from both regions…

New Platform

Universal improvements

Last week we delivered improvements to our October 2024 Universal release across latency, accuracy, and language coverage…

Improvement Pre-recorded Speech-to-Text

Ukrainian support for Speaker Diarization

Our Speaker Diarization service now supports Ukrainian speech. This update enables automatic speaker labeling for Ukrainian audio files, making transcripts more readable and powering downstream features in multi-speaker…

New Pre-recorded Speech-to-Text

Claude 2 sunset

As previously announced, we sunset Claude 2 and Claude 2.1 for LeMUR on February 6th…

Improvement LLM Gateway

Reduced hallucination rates; Bugfix

We have reduced Universal-2's hallucination rate for the string "sa" during periods of silence…

Fix Guardrails

Multichannel audio trim fix

We've fixed an issue which caused the audio_start_from and audio_end_at parameters to not be respected for multichannel audio…

Fix Platform

Platform enhancements and security updates

🌍 Simplified EU Data Residency & Management…

New Platform

Reduced hallucination rates

We have reduced Universal-2's hallucination rate for the word "it" during periods of silence…

Improvement Guardrails

New dashboard features

Two new features are available to users on their dashboards:…

New Platform

Reliability improvements

We've made reliability improvements for Claude models in our LeMUR framework…

Improvement Speech Understanding

LiveKit 🤝 AssemblyAI

We've released the AssemblyAI integration for the LiveKit Agents framework , allowing developers to use our Streaming Speech-to-Text model in their real-time LiveKit applications…

Improvement Real-time Speech-to-Text Pre-recorded Speech-to-Text

SOC2 Type 2 expansion and renewal

We have renewed our SOC2 Type 2 certification, and expanded it to include Processing Integrity. Our SOC2 Type 2 certification now covers all five Trust Services Criteria (TSCs)…

Improvement Platform

ISO 27001:2022 certification

We have obtained our inaugural ISO 27001:2022 certification, which is an internationally recognized standard for managing information security…

Improvement Platform

Timestamp improvement; no-space languages fix

We've improved our timestamp algorithm, yielding higher accuracy for long numerical strings like credit card numbers, phone numbers, etc…

Improvement Platform

Multichannel support

We now offer multichannel transcription, allowing users to transcribe files with up to 32 separate audio channels, making speaker identification easier in situations like virtual meetings…

New Pre-recorded Speech-to-Text

Introducing Universal-2

Last week we released Universal-2 , our latest Speech-to-Text model. Universal-2 builds upon our previous model Universal-1 to make significant improvements in "last mile" challenges critical to real-world use cases -…

Improvement Pre-recorded Speech-to-Text

Claude Instant 1.2 removed from LeMUR

The following models were removed from LeMUR: anthropic/claude-instant-1-2 and basic (legacy, equivalent to anthropic/claude-instant-1-2 ), which will now return a 400 validation error if called…

Improvement LLM Gateway Speech Understanding

French performance patch; bugfix

We recently observed a degradation in accuracy when transcribing French files through our API. We have since pushed a bugfix to restore performance to prior levels…

Fix Speech Understanding Pre-recorded Speech-to-Text

New and improved - AssemblyAI Q3 recap

Check out our quarterly wrap-up for a summary of the new features and integrations we launched this quarter, as well as improvements we made to existing models and functionality…

New Speech Understanding Pre-recorded Speech-to-Text

Claude 1 & 2 sunset

Recently, Anthropic announced that they will be deprecating legacy LLM models that are usable via LeMUR. We will therefore be sunsetting these models in advance of Anthropic's end-of-life for them:…

Improvement LLM Gateway

Langflow 🤝 AssemblyAI

We've released the AssemblyAI integration for Langflow , allowing low-code builders to incorporate Speech AI into their workflows…

Improvement Platform

Speaker Labels bugfix

We've fixed an edge-case issue that would cause requests using Speaker Labels to fail for some files…

Fix Platform

Activepieces 🤝 AssemblyAI

We've released the AssemblyAI integration for Activepieces , allowing no-code and low-code builders to incorporate Speech AI into their workflows…

Improvement Pre-recorded Speech-to-Text

Language confidence threshold bugfix

We've fixed an edge-case which would sometimes occur due to language fallback when Automatic Language Detection (ALD) was used in conjunction with language_confidence_threshold , resulting in executed transcriptions…

Fix Pre-recorded Speech-to-Text

Automatic Language Detection improvements

We've made improvements to our Automatic Language Detection (ALD) model, yielding increased accuracy, expanded language support, and customizable confidence thresholds…

Improvement Pre-recorded Speech-to-Text

Free Offer improvements

We've made a series of improvements to our Free Offer:…

Improvement Pre-recorded Speech-to-Text

Speaker Diarization improvements

We've made improvements to our Speaker Diarization model, especially robustness in distinguishing between speakers with similar voices…

Improvement Pre-recorded Speech-to-Text

File upload improvements and more

We've made improvements to error handling for file uploads that fail. Now if there is an error, such as a file containing no audio, the following 422 error will be returned:…

Improvement Platform

New endpoints for LeMUR Claude 3

Last month, we announced support for Claude 3 in LeMUR. Today, we are adding support for two new endpoints - Question & Answer and Summary (in addition to the pre-existing Task endpoint) - for these newest models:…

New LLM Gateway Speech Understanding

Enhanced AssemblyAI app for Zapier

We've launched our Zapier integration v2.0, which makes it easy to use our API in a no-code way. The enhanced app is more flexible, supports more Speech AI features, and integrates more closely into the Zap editor. The…

Improvement Speech Understanding Pre-recorded Speech-to-Text

LeMUR browser support

LeMUR can now be used from browsers, either via our JavaScript SDK or fetch…

New Speech Understanding

LeMUR - Claude 3 support

Last week, we released Anthropic's Claude 3 model family into LeMUR, our LLM framework for speech…

New LLM Gateway Speech Understanding

JavaScript SDK fix

We've fixed an issue which was causing the JavaScript SDK to surface the following error when using the SDK in the browser:…

Fix Platform

Timestamps improvement; bugfixes

We've made significant improvements to the timestamp accuracy of our Speech-to-Text Best tier for English, Spanish, and German. 96% of timestamps are accurate within 200ms, and 86% of timestamps are now accurate within…

Improvement Pre-recorded Speech-to-Text

Streaming (formerly Real-time) improvements

We've made model improvements that significantly improve the accuracy of timestamps when using our Streaming Speech-to-Text service. Most timestamps are now accurate within 100 ms…

Improvement Real-time Speech-to-Text

Variable-bitrate video support; bugfix

We've deployed changes which now permit variable-bitrate video files to be submitted to our API…

New Pre-recorded Speech-to-Text

LeMUR improvements

We have added two new keys to the LeMUR response, input_tokens and output_tokens , which can help users track usage…

New Speech Understanding

PII Redaction and Entity Detection improvements

We've improved our PII Text Redaction and Entity Detection models, yielding more accurate detection and removal of PII and other entities from transcripts…

New Speech Understanding

Usage and spend alerts

Users can now set up billing alerts in their user portals. Billing alerts notify you when your monthly spend or account balance reaches a threshold…

New Platform

Universal-1 now available in German

Universal-1 , our most powerful and accurate multilingual Speech-to-Text model, is now available in German…

New Pre-recorded Speech-to-Text

New API Reference, Timestamps improvements

We’ve released a new version of the API Reference section of our docs for an improved developer experience. Here’s what’s new:…

New Speech Understanding Pre-recorded Speech-to-Text

New codec support; account deletion support

We’ve upgraded our transcoding library and now support the following new codecs:…

New Speech Understanding Pre-recorded Speech-to-Text

AssemblyAI app for Make.com

Make (formerly Integromat) is a no-code automation platform that makes it easy to build tasks and workflows that synthesize many different services…

New Speech Understanding Pre-recorded Speech-to-Text

GDPR and PCI DSS compliance

AssemblyAI is now officially PCI Compliant . The Payment Card Industry Data Security Standard Requirements and Security Assessment Procedures (PCI DSS) certification is a rigorous assessment that ensures card holder…

New Platform

Self-serve invoices; dual-channel improvement

Users of our API can now view and download their self-serve invoices in their dashboards under Billing > Your invoices…

New Speech Understanding Pre-recorded Speech-to-Text

Introducing Universal-1

Last week we released Universal-1, a state-of-the-art multimodal speech recognition model. Universal-1 is trained on 12.5M hours of multilingual audio data , yielding impressive performance across the four key languages…

Improvement Platform

New Streaming STT features

We’ve added a new message type to our Streaming Speech-to-Text (STT) service. This new message type SessionInformation is sent immediately before the final SessionTerminated message when closing a Streaming session, and…

New Real-time Speech-to-Text

Dual channel transcription improvements

We’ve made improvements to how utterances are handled during dual-channel transcription . In particular, the transcription service now has elevated sensitivity when detecting utterances, leading to improved utterance…

Improvement Pre-recorded Speech-to-Text

LeMUR concurrency fix

We’ve fixed a temporary issue in which users with low account balances would occasionally be rate-limited to a value less than 30 when using LeMUR…

Fix Speech Understanding

Fewer "File does not appear to contain audio" errors

We’ve fixed an edge-case bug in our async API, leading to a significant reduction in errors that say File does not appear to contain audio…

Fix Pre-recorded Speech-to-Text

New developer controls for real-time end-of-utterance

We have released developer controls for real-time end-of-utterance detection, providing developers control over when an utterance is considered complete…

New Real-time Speech-to-Text

PII Redaction and Entity Detection available in 13 additional languages

We have launched PII Text Redaction and Entity Detection for 13 new languages:…

New Speech Understanding

Fewer LeMUR 500 errors

We’ve made improvements to our LeMUR service to reduce the number of 500 errors…

Improvement Speech Understanding

Free tier limit increase; Real-time concurrency increase

We have increased the usage limit for our free tier to 100 hours . New users can now use our async API to transcribe up to 100 hours of audio, with a concurrency limit of 5, before needing to upgrade their accounts…

New Real-time Speech-to-Text

Latency and cost reductions, concurrency increase

We introduced major improvements to our API’s inference latency, with the majority of audio files now completing in well under 45 seconds regardless of audio duration, with a Real-Time Factor (RTF) of up to .008…

New Speech Understanding Pre-recorded Speech-to-Text

Claude 2.1 available through LeMUR

Anthropic’s Claude 2.1 is now generally available through LeMUR. Claude 2.1 is similar to our Default model and has reduced hallucinations, a larger context window, and performs better in citations…

New LLM Gateway Speech Understanding

Real-time Binary support, improved async timestamps

Our real-time service now supports binary mode for sending audio segments. Users no longer need to encode audio segments as base64 sequences inside of JSON objects - the raw binary audio segment can now be directly sent…

New Real-time Speech-to-Text

New Node/JavaScript SDK works in multiple runtimes

We’ve released v4 of our Node JavaScript SDK. Previously, the SDK was developed specifically for Node, but the latest version now works in additional runtimes without any extra steps. The SDK can now be used in the…

New Platform

New Punctuation Restoration and Truecasing models, PCM Mu-law support

We’ve released new Punctuation and Truecasing models, achieving significant improvements for acronyms, mixed-case words, and more…

New Pre-recorded Speech-to-Text

New LeMUR parameter, reduced hold music hallucinations

Users can now directly pass in custom text inputs into LeMUR through the input_text parameter as an alternative to transcript IDs. This gives users the ability to use any information from the async API, formatted…

New Guardrails Speech Understanding

Reduced latency, improved error messaging

We’ve made improvements to our file downloading pipeline which reduce transcription latency. Latency has been reduced by at least 3 seconds for all audio files, with greater improvements for large audio files provided…

Improvement Pre-recorded Speech-to-Text

New Dashboard features and LeMUR fix

We have released the beta for our new usage dashboard . You can now see a usage summary broken down by async transcription, real-time transcription, Audio Intelligence, and LeMUR. Additionally, you can see charts of…

New Speech Understanding

New LeMUR features and other improvements

We have added a new parameter to LeMUR that allows users to specify a temperature for LeMUR generation. Temperature refers to how stochastic the generated text is and can be a value from 0 to 1, inclusive, where 0…

New Speech Understanding

Improvements - observability, logging, and patches

We have improved logging for our LeMUR service to allow for the surfacing of more detailed errors to users…

Improvement Speech Understanding

Multi-language speaker labels

We have recently launched Speaker Labels for 10 additional languages:…

New Platform

Audio Intelligence unbundling and price decreases

We have unbundled and lowered the price for our Audio Intelligence models. Previously, the bundled price for all Audio Intelligence models was $2.10/hr , regardless of the number of models used…

Improvement Speech Understanding

New language support and improvements to existing languages

We now support the following additional languages for asynchronous transcription through our /v2/transcript endpoint:…

New Pre-recorded Speech-to-Text

Pricing decreases

We have decreased the price of Core Transcription from $0.90 per hour to $0.65 per hour , and decreased the price of Real-Time Transcription from $0.90 per hour to $0.75 per hour…

Improvement Pre-recorded Speech-to-Text

Significant Summarization model speedups

We’ve implemented changes that yield between a 43% to 200% increase in processing speed for our Summarization models, depending on which model is selected, with no measurable impact on the quality of results…

Improvement Speech Understanding

Introducing LeMUR, the easiest way to build LLM apps on spoken data

We've released LeMUR - our framework for applying LLMs to spoken data - for general availability. LeMUR is optimized for high accuracy on specific tasks:…

Improvement Speech Understanding

Introducing our Conformer-2 model

We've released Conformer-2 , our latest AI model for automatic speech recognition. Conformer-2 is trained on 1.1M hours of English audio data, extending Conformer-1 to provide improvements on proper nouns,…

Improvement Pre-recorded Speech-to-Text

New parameter and timestamps fix

We’ve introduced a new, optional speech_threshold parameter, allowing users to only transcribe files that contain at least a specified percentage of spoken audio, represented as a ratio in the range [0, 1]…

New Pre-recorded Speech-to-Text

Character sequence improvements

We’ve fixed an issue in which the last character in an alphanumeric sequence could fail to be transcribed. The fix is effective immediately and constitutes a 95% reduction in errors of this type…

Fix Pre-recorded Speech-to-Text

Speaker Labels improvement

We’ve made improvements to the Speaker Labels model, adjusting the impact of the speakers_expected parameter to better allow the model to determine the correct number of unique speakers, especially in cases where one or…

Improvement Platform

Significant processing time improvement

We’ve made significant improvements to our transcoding pipeline, resulting in a 98% overall speedup in transcoding time and a 12% overall improvement in processing time for our asynchronous API…

Improvement Platform

Announcing LeMUR - our new framework for applying powerful LLMs to transcribed speech

We’re introducing our new framework LeMUR , which makes it simple to apply Large Language Models (LLMs) to transcripts of audio files up to 10 hours in length…

New Speech Understanding Pre-recorded Speech-to-Text

New PII and Entity Detection Model

We’ve upgraded to a new and more accurate PII Redaction model, which improves credit card detections in particular…

Improvement Speech Understanding

Multilingual and stereo audio fixes, & Japanese model retraining

We’ve fixed two edge cases in our async transcription pipeline that were producing non-deterministic results from multilingual and stereo audio…

Improvement Pre-recorded Speech-to-Text

Decreased latency and improved password reset

We’ve implemented a range of improvements to our English pipeline, leading to an average 38% improvement in overall latency for asynchronous English transcriptions…

Improvement Pre-recorded Speech-to-Text

Conformer-1 now available for Real-Time transcription, new Speaker Labels parameter, and more

We're excited to announce that our new Conformer-1 Speech Recognition model is now available for real-time English transcriptions, offering a 24.3% relative accuracy improvement…

New Real-time Speech-to-Text Pre-recorded Speech-to-Text

Introducing our Conformer-1 model

We've released our new Conformer-1 model for speech recognition. Conformer-1 was trained on 650K hours of audio data and is our most accurate model to date…

Improvement Pre-recorded Speech-to-Text

New AI Models for Italian / Japanese Punctuation Improvements

Our Content Safety and Topic Detection models are now available for use with Italian audio files…

New Pre-recorded Speech-to-Text

Hindi Punctuation Improvements

We’ve made improvements to our Hindi punctuation model, increasing relative accuracy by 26% . These changes are effective immediately for all Hindi audio files submitted to AssemblyAI…

Improvement Pre-recorded Speech-to-Text

Improved PII Redaction

We’ve released a new version of our PII Redaction model to improve PII detection accuracy, especially for credit card and phone number edge cases…

Improvement Speech Understanding

Automatic Language Detection Upgrade

We’ve released a new version of our Automatic Language Detection model that better targets speech-dense parts of audio files, yielding improved accuracy…

Improvement Pre-recorded Speech-to-Text

Password Reset

Users can now reset their passwords from our web UI. From the Dashboard login , simply click “ Forgot your password? ” to initiate a password reset. Alternatively, users who are already logged in can change their…

New Platform

Dual Channel Support for Conversational Summarization / Improved Timestamps

We’ve made updates to our Conversational Summarization model to support dual-channel files. Effective immediately, dual_channel may be set to True when summary_model is set to conversational…

New Speech Understanding

Improved Transcription Accuracy for Phone Numbers

We’ve made updates to our Core Transcription model to improve the transcription accuracy of phone numbers by 10%. This improvement is effective immediately for all audio files submitted to AssemblyAI for transcription…

Improvement Pre-recorded Speech-to-Text

v9 Transcription Model Released

We are happy to announce the release of our most accurate Speech Recognition model to date - version 9 (v9). This updated model delivers increased performance across many metrics on a wide range of audio types…

Improvement Pre-recorded Speech-to-Text

New Summarization Models Tailored to Use Cases

We are excited to announce that new Summarization models are now available! Developers can now choose between multiple summary models that best fit their use case and customize the output based on the summary length…

Improvement Speech Understanding

Improved Transcription Accuracy for COVID

We’ve made updates to our Core Transcription model to improve the transcription accuracy of the word COVID . This improvement is effective immediately for all audio files submitted to AssemblyAI for transcription…

Improvement Pre-recorded Speech-to-Text

New Audio Intelligence Models: Summarization

Starting today, you can now transcribe and summarize entire audio files with a single API call…

Improvement Speech Understanding

Automatic Casing / Short Utterances

We’ve improved our Automatic Casing model and fixed a minor bug that caused over-capitalization in English transcripts. The Automatic Casing model is enabled by default with our Core Transcription API to improve…

Improvement Pre-recorded Speech-to-Text

Static IP Support for Webhooks

Over the next few weeks, we will begin rolling out Static IP support for webhooks to customers in stages…

Improvement Pre-recorded Speech-to-Text

Improved Number Transcription

We’ve made improvements to our Core Transcription model to better identify and transcribe numbers present in your audio files…

Improvement Pre-recorded Speech-to-Text

Improved Disfluency Timestamps

We've updated our Disfluency Detection model to improve the accuracy of timestamps for disfluency words…

Improvement Platform

Speaker Label Improvement

We've improved the Speaker Label model’s ability to identify unique speakers for single word or short utterances…

Improvement Platform

Historical Transcript Bug Fix

We've fixed a bug with the Historical Transcript endpoint that was causing null to appear as the value of the completed key…

Fix Platform

Japanese Transcription Now Available

Today, we’re releasing our new Japanese transcription model to help you transcribe and analyze your Japanese audio and video files using our cutting-edge AI…

Improvement Pre-recorded Speech-to-Text

Hindi Transcription / Custom Webhook Headers

We’ve released our new Hindi transcription model to help you transcribe and analyze your Hindi audio and video files…

New Pre-recorded Speech-to-Text

Improved Speaker Labels Accuracy and Speaker Segmentation

Improved the overall accuracy of the Speaker Labels feature and the model’s ability to segment speakers. Fix a small edge case that would occasionally cause some transcripts to complete with NULL as the language_code…

Improvement Platform

Content Moderation and Topic Detection Available for Portuguese

Content Moderation and Topic Detection now available for the Portuguese language. Improved Inverse Text Normalization of money amounts in transcript text. Addressed an issue with Real-Time Transcription that would…

New Speech Understanding

Automatic Language Detection Available for Dutch and Portuguese

Automatic Language Detection now supports detecting Dutch and Portuguese. Accuracy of the Automatic Language Detection model improved on files with large amounts of silence. Improved speaker segmentation accuracy for…

New Pre-recorded Speech-to-Text

Dutch and Portuguese Support Released

Dutch and Portuguese transcription is now generally available for our /v2/transcript endpoint. See our documentation for more information on specifying a language in your POST request…

New Pre-recorded Speech-to-Text

Content Moderation and Topic Detection Available for French, German, and Spanish

Content Moderation and Topic Detection features are now available for French, German, and Spanish languages. Improved redaction accuracy for credit_card_number , credit_card_expiration , and credit_card_cvv policies in…

New Speech Understanding

French, German, and Italian Support Released

French, German, and Italian transcription is now publicly available. Check out our documentation for more information on Specifying a Language in your POST request. Released v2 of our Spanish model, improving absolute…

New Pre-recorded Speech-to-Text

Miscellaneous Bug Fixes

Fixed an edge case that would occasionally affect timestamps for a small number of words when disfluencies was set to true . Fixed an edge case where PII audio redaction would occasionally fail when using local files…

Fix Platform

Spanish Language Support, Automatic Language Detection, and Custom Spelling Released

Spanish transcription is now publicly available. Check out our documentation for more information on Specifying a Language in your POST request. Automatic Language Detection is now available for our /v2/transcript…

New Pre-recorded Speech-to-Text

Auto Chapters v6 Released

Released Auto Chapters v6, improving the summarization of longer chapters…

Improvement Platform

Auto Chapters v5 Released

Auto Chapters v5 released, improving headline and gist generation and quote formatting in the summary key. Fixed an edge case in Dual-Channel files where initial words in an audio file would occasionally be missed in…

Improvement Pre-recorded Speech-to-Text

Regional Spelling Improvements

Region-specific spelling improved for en_uk and en_au language codes. Improved the formatting of “MP3” in transcripts. Improved Real-Time transcription error handling for corrupted audio files…

Improvement Pre-recorded Speech-to-Text

Real-Time v3 Released

Released v3 of our Real-Time Transcription model, improving overall accuracy by 18% and proper noun recognition by 23% relative to the v2 model…

New Real-time Speech-to-Text

Auto Chapters v4 Released, Auto Retry Feature Added

Added an Auto Retry feature, which automatically retries transcripts that fail with a Server error, developers have been alerted message…

New Platform

Auto Chapters v3 Released

Released v3 of our Auto Chapters model, improving the model’s ability to segment audio into chapters and chapter boundary detection by 56.3%…

Improvement Platform

Miscellaneous Bug Fixes

Fixed a rare edge case affecting audio duration calculation of a small percentage of multi-channel files that contained no speech. Miscellaneous bug fixes for Real-Time Transcription…

Fix Pre-recorded Speech-to-Text

Webhook Status Codes, Entity Detection Improved

POST requests from the API to webhook URLs will now accept any status code from 200 to 299 as a successful HTTP response. Previously only 200 status codes were accepted. Updated the text key in our Entity Detection…

Improvement Speech Understanding

Punctuation and Casing Accuracy Improved, Inverse Text Normalization Model Updated

Released v4 of our Punctuation model, increasing punctuation and casing accuracy by ~2%. Updated our Inverse Text Normalization (ITN) model for our /v2/transcript endpoint, improving web address and email address…

Improvement Pre-recorded Speech-to-Text

Support for Non-English Languages Coming Soon

Our Deep Learning team has been hard at work training our new non-English language models. In the coming weeks, we will be adding support for French, German, Italian, and Spanish…

Improvement Platform

Shorter Summaries Added to Auto Chapters, Improved Filler Word Detection

Added a new gist key to the Auto Chapters feature. This new key provides an ultra-short, usually 3 to 8 word summary of the content spoken during that chapter. Implemented profanity filtering into Auto Chapters, which…

New Real-time Speech-to-Text Pre-recorded Speech-to-Text

v8.5 Asynchronous Transcription Model Released

Our Asynchronous Speech Recognition model is now even better with the release of v8.5. This update improves overall accuracy by 4% relative to our v8 model. This is achieved by improving the model’s ability to handle…

Improvement Pre-recorded Speech-to-Text

New and Improved API Documentation

Launched the new AssemblyAI Docs, with more complete documentation and an easy-to-navigate interface so developers can effectively use and integrate with our API…

New Pre-recorded Speech-to-Text

Inverse Text Normalization Added to Real-Time, Word Boost Accuracy Improved

Inverse Text Normalization (ITN) added for our /v2/realtime and /v2/stream endpoints. ITN improves formatting of entities like numbers, dates, and proper nouns in the transcription text. Improved accuracy for Custom…

New Real-time Speech-to-Text Pre-recorded Speech-to-Text

Entity Detection Released, Improved Filler Word Detection, Usage Alerts

v1 release of Entity Detection - automatically detects a wide range of entities like person and company names, emails, addresses, dates, locations, events, and more…

New Speech Understanding

Additional MIME Type Detection Added for OPUS Files

Added additional MIME type detection to detect a wider variety of OPUS files. Fixed an issue with word timing calculations that caused issues with speaker labeling for a small number of transcripts…

Improvement Platform

Custom Vocabulary Accuracy Significantly Improved

Significantly improved the accuracy of Custom Vocabulary , and the impact of the boost_param field to control the weight for Custom Vocabulary. Improved precision of word timings…

Improvement Platform

New Auto Chapters, Sentiment Analysis, and Disfluencies Features Released

v1 release of Auto Chapters - which provides a "summary over time" by breaking audio/video files into "chapters" based on the topic of conversation…

New Speech Understanding

New Language Code Parameter for English Spelling

Added a new language_code parameter when making requests to /v2/transcript . Developers can set this to en_us , en_uk , and en_au , which will ensure the correct English spelling is used - British English, Australian…

New Platform

New Features Coming Soon, Bug Fixes

This week, our engineering team has been hard at work preparing for the release of exciting new features like: Chapter Detection : Automatically summarize audio and video files into segments (aka "chapters")…

Improvement Platform

Improved v8 Model Processing Speed

Improved the API's ability to handle audio/video files with a duration over 8 hours. Further improved transcription processing times by 12%. Fixed an edge case in our responses for dual channel audio files where if…

Improvement Pre-recorded Speech-to-Text

v8 Transcription Model Released

Today, we're happy to announce the release of our most accurate Speech Recognition model for asynchronous transcription to date—version 8 (v8)…

New Pre-recorded Speech-to-Text

v2 Real-Time and v4 Topic Detection Models Released

Launched our v2 Real-Time Streaming Transcription model ( read more on our blog ). This new model improves accuracy of our Real-Time Streaming Transcription by ~10%. Launched our Topic Detection v4 model, with an…

New Real-time Speech-to-Text Speech Understanding

v3 Topic Detection Model, PII Redaction Bug Fixes

Released our v3 Topic Detection model. This model dramatically improves the Topic Detection feature's ability to accurately detect topics based on context. For example, in the following text, the model was able to…

New Speech Understanding

Severity Scores for Content Safety

The API now returns a severity score along with the confidence and label keys when using the Content Safety feature. The severity score measures how intense a detected Content Safety label is on a scale of 0 to 1. For…

Improvement Platform

Real-time Transcription and Streaming Fixes

Fixed an edge case where higher sample rates would occasionally trigger a Client sent audio too fast error from the Real-Time Streaming WebSocket API…

Fix Real-time Speech-to-Text Pre-recorded Speech-to-Text

Punctuation v3, Word Search, Bug Fixes

v3 Punctuation Model released. v3 brings improved accuracy to automatic punctuation and casing for both async ( /v2/transcript ) and real-time (WebSocket API) transcripts. Released an all-new Word Search feature that…

New Pre-recorded Speech-to-Text

General Improvements

Fixed a bug with PII Redaction, where sometimes dollar amount and date tokens were not being properly redacted. AssemblyAI now supports even more audio/video file formats thanks to improvements to our audio transcoding…

Improvement Platform

ITN Model Update

Today we've released a major improvement to our ITN (Inverse Text Normalization) model. This results in better formatting for entities within the transcription, such as phone numbers, money amounts, and dates…

Improvement Pre-recorded Speech-to-Text

Punctuation Model v2.5 Released

Today we've released an updated Automatic Punctuation and Casing Restoration model (Punctuation v2.5)! This update results in improved capitalization of proper nouns in transcripts, reduces over-capitalization issues…

Improvement Pre-recorded Speech-to-Text

Content Safety Model (v7) Released

We have released an updated Content Safety Model - v7! Performance for 10 out of all 19 Content Safety labels has been improved, with the biggest improvements being for the Profanity and Natural Disasters labels…

Improvement Platform

Real-Time Transcription Model v1.1 Released

Developers will now be able to use the word_boost parameter in requests to the real-time API, allowing you to introduce your own custom vocabulary to the model for that given session…

Improvement Real-time Speech-to-Text Pre-recorded Speech-to-Text

Topic Detection Model v2 Released

Today we have released v2 of our Topic Detection Model. This new model will predict multiple topics for each paragraph of text, whereas v1 was limited to predicting a single. For example, given the text:…

Improvement Speech Understanding

Increased Number of Categories Returned for Topic Detection Summary

In this minor improvement, we have increased the number of topics the model can return in the summary key of the JSON response from 10 to 20…

Improvement Speech Understanding

Temporary Tokens for Real-Time

Often times, developers will need to expose their AssemblyAI API Key in their client applications when establishing connections with our real-time streaming transcription API…

Improvement Real-time Speech-to-Text

Adding "Marijuana" and "Sensitive Social Issues" as Possible Content Safety Labels

In this minor update, we improve the accuracy across all Content Safety labels, and add two new labels for better content categorization. The two new labels are sensitive_social_issues and marijuana…

Improvement Platform

Real-Time Transcription is Now GA

We are pleased to announce the official release of our Real-Time Streaming Transcription API! This API uses WebSockets and a fast Conformer Neural Network architecture that allows for a quick and accurate transcription…

Improvement Real-time Speech-to-Text Pre-recorded Speech-to-Text

General Improvements

Developers can now send in files up to 5.5 GB in size, compared to the previous 4.5 GB. More topics have been added to our Topic Detection Model, along with increased speed and accuracy. You can see a complete list of…

Improvement Pre-recorded Speech-to-Text

Content Safety Detection and Topic Detection are now GA!

Today we have released two of our enterprise-level models, Content Safety Detection and Topic Detection, to all users…

Improvement Speech Understanding

Minor Update to PII Redaction

With this minor update, our Redaction Model will better detect Social Security Numbers and Medical References for additional security and data protection…

Improvement Speech Understanding

New Punctuation Model (v2)

Today we released a new punctuation model that is more extensive than its predecessor, and will drive improvements in punctuation and casing accuracy…

Improvement Pre-recorded Speech-to-Text

New Features & Updates

You can explore each feature further in our Docs:…

Improvement Pre-recorded Speech-to-Text

New PII Classes

We have released an update to our PII Redaction Model that will now support detecting and redacting additional classes…

Improvement Speech Understanding

General Improvements

We have made a major update to our Speaker Diarization Mode l that will improve results both in speed and accuracy. This update introduces the UNK speaker label for when a speaker for a word/phrase is unknown. This…

Improvement Pre-recorded Speech-to-Text