Changelog
Follow along to see weekly accuracy and product improvements.
Real-time Transcription and Streaming Fixes
- Fixed an edge case where higher sample rates would occasionally trigger a
Client sent audio too fast
error from the Real-Time Streaming WebSocket API. - Fixed an edge case where some streams from Real-Time Streaming WebSocket API were held open after a customer idled their session.
- Fixed an edge case in the
/v2/stream
endpoint, where large periods of silence would occasionally cause automatic punctuation to fail. - Improved error handling when non-JSON input is sent to the
/v2/transcript
endpoint.
Punctuation v3, Word Search, Bug Fixes
- v3 Punctuation Model released.
- v3 brings improved accuracy to automatic punctuation and casing for both async (
/v2/transcript
) and real-time (WebSocket API) transcripts. - Released an all-new Word Search feature that will allow developers to search for words in a completed transcript.
- This new feature returns how many times the word was spoken, the index of that word in the transcript's JSON response word list/array, and the associated timestamps for each matched word.

- Fixed an issue causing a small subset of words not to be filtered when profanity filtering was turned on.
General Improvements
- Fixed a bug with PII Redaction, where sometimes
dollar amount
anddate
tokens were not being properly redacted. - AssemblyAI now supports even more audio/video file formats thanks to improvements to our audio transcoding pipeline!
- Fixed a rare bug where a small percentage of transcripts (0.01%) would incorrectly sit in a status of "queued" for up to 60 seconds.
ITN Model Update
Today we've released a major improvement to our ITN (Inverse Text Normalization) model. This results in better formatting for entities within the transcription, such as phone numbers, money amounts, and dates.
For example:
Money:
- Spoken: "Hey, do you have five dollars?"
- Model output with ITN: "Hey, do you have $5?"
Years:
- Spoken: "Yes, I believe it was back in two thousand eight"
- Model output with ITN: "Yes, I believe it was back in 2008."
Punctuation Model v2.5 Released
Today we've released an updated Automatic Punctuation and Casing Restoration model (Punctuation v2.5)! This update results in improved capitalization of proper nouns in transcripts, reduces over-capitalization issues where some words like were being incorrectly capitalized, and improves some edge cases around words with commas around them. For example:
- "....in the Us" now becomes "....in the US."
- "whatsapp," now becomes "WhatsApp,"
Content Safety Model (v7) Released
We have released an updated Content Safety Model - v7! Performance for 10 out of all 19 Content Safety labels has been improved, with the biggest improvements being for the Profanity and Natural Disasters labels.
Real-Time Transcription Model v1.1 Released
We have just released a major real-time update!
Developers will now be able to use the word_boost
parameter in requests to the real-time API, allowing you to introduce your own custom vocabulary to the model for that given session! This custom vocabulary will lead to improved accuracy for the provided words.
General Improvements
We will now be limiting one websocket connection per real-time session to ensure the integrity of a customer's transcription and prevent multiple users/clients from using the websocket same session.
Note: Developers can still have multiple real-time sessions open in parallel, up to the Concurrency Limit on the account. For example, if an account has a Concurrency Limit of 32, that account could have up to 32 concurrent real-time sessions open.
Topic Detection Model v2 Released
Today we have released v2 of our Topic Detection Model. This new model will predict multiple topics for each paragraph of text, whereas v1 was limited to predicting a single. For example, given the text:
"Elon Musk just released a new Tesla that drives itself!"
v1:
Automotive>AutoType>DriverlessCars: 1
v2:
Automotive>AutoType>DriverlessCars: 1
PopCulture : 0.84
PopCulture>CelebrityStyle: 0.56
This improvement will result in the visual output looking significantly better, and containing more informative responses for developers!
Increased Number of Categories Returned for Topic Detection Summary
In this minor improvement, we have increased the number of topics the model can return in the summary
key of the JSON response from 10 to 20.
Temporary Tokens for Real-Time
Often times, developers will need to expose their AssemblyAI API Key in their client applications when establishing connections with our real-time streaming transcription API. Now, developers can create a temporary API token that expires in a customizable amount of time (similar to an AWS S3 Temporary Authorization URL) that can safely be exposed in the client applications and front-ends.
This will allow developers to create short-lived API tokens designed to be used securely in the browser, along with authorization within the query string!
For example, authenticating in the query parameters with a temporary token would look like so:
wss://api.assemblyai.com/v2/realtime/ws?sample_rate=16000&token={TEMP_TOKEN}
For more information, you can view our Docs!
Adding "Marijuana" and "Sensitive Social Issues" as Possible Content Safety Labels
In this minor update, we improve the accuracy across all Content Safety labels, and add two new labels for better content categorization. The two new labels are sensitive_social_issues
and marijuana
.
New label definitions:
sensitive_social_issues
: This category includes content that may be considered insensitive, irresponsible, or harmful to specific groups based on their beliefs, political affiliation, sexual orientation, or gender identity.marijuana
: This category includes content that discusses marijuana or its usage.
Real-Time Transcription is Now GA
We are pleased to announce the official release of our Real-Time Streaming Transcription API! This API uses WebSockets and a fast Conformer Neural Network architecture that allows for a quick and accurate transcription in real-time.
Find out more in our Docs here!
General Improvements
- Developers can now send in files up to 5.5 GB in size, compared to the previous 4.5 GB.
- More topics have been added to our Topic Detection Model, along with increased speed and accuracy. You can see a complete list of detectable topics in our Docs here!
- An issue with speaker diarization where speakers were being missed, even when speaking long enough to be detected, has been solved!
Content Safety Detection and Topic Detection are now GA!
Today we have released two of our enterprise-level models, Content Safety Detection and Topic Detection, to all users!
Now any developer can make use of these cutting edge models within their applications and products. Explore these new features in our Docs:
Minor Update to PII Redaction
With this minor update, our Redaction Model will better detect Social Security Numbers and Medical References for additional security and data protection!
New Punctuation Model (v2)
Today we released a new punctuation model that is more extensive than its predecessor, and will drive improvements in punctuation and casing accuracy!
New Features & Updates
List Historical Transcripts
- Developers can get a list of their historical transcriptions. This list can be filtered by status and date. This new endpoint will allow developers to see if they have any queued, processing, or throttled transcriptions.
Pre-Formatted Paragraphs
- Developers can now get pre-formatted paragraphs by calling our new paragraphs endpoint! The model will attempt to semantically break the transcript up into paragraphs of five sentences or less.
You can explore each feature further in our Docs:
Topic Detection Response Improvements
- Now each topic will include timestamps for each segment of classified text. We have also added a new summary key that will contain the confidence of all unique topics detected throughout the entire transcript.
- We have made improvements to our Speaker Diarization Model that increases accuracy over short and long transcripts.
New PII Classes
We have released an update to our PII Redaction Model that will now support detecting and redacting additional classes!
blood_type
medical_condition
drug
(including vitamins/minerals)injury
medical_process
Entity Definitions:
blood_type
: Blood typemedical_condition
: A medical condition. Includes diseases, syndromes, deficits, disorders. E.g., chronic fatigue syndrome, arrhythmia, depression.drug
: Medical drug, including vitamins and minerals. E.g., Advil, Acetaminophen, Panadolinjury
: Human injury, e.g., I broke my arm, I have a sprained wrist. Includes mutations, miscarriages, and dislocations.medical_process
: Medical process, including treatments, procedures, and tests. E.g., "heart surgery," "CT scan."
General Improvements
- We have made a major update to our Speaker Diarization Model that will improve results both in speed and accuracy. This update introduces the
UNK
speaker label for when a speaker for a word/phrase is unknown. This label is in place to prevent combining the unknown speaker with the dominant speaker, giving the developer more insight into who may or may not be speaking!
Our Content Safety Model has been trained on higher-quality data and now supports the following new labels:
Company Financials
: can detect when things like stock prices or revenue are discussed.Natural Disasters
: in the past, we used the labelAccidents
to cover natural disasters and man-made accidents like plane crashes. NowNatural Disasters
covers things like hurricanes, andAccidents
covers Man-Made Accidents like plane crashes.