Changelog
Follow along to see weekly accuracy and product improvements.
February 7, 2022
Auto Chapters v3 Released
- Released v3 of our Auto Chapters model, improving the model’s ability to segment audio into chapters and chapter boundary detection by 56.3%.
- Improved formatting for Auto Chapters summaries. The
summary
,headline
, andgist
keys now include better punctuation, casing, and text formatting.
January 31, 2022
Miscellaneous Bug Fixes
- Fixed a rare edge case affecting audio duration calculation of a small percentage of multi-channel files that contained no speech.
- Miscellaneous bug fixes for Real-Time Transcription.
January 24, 2022
Webhook Status Codes, Entity Detection Improved
POST
requests from the API to webhook URLs will now accept any status code from200
to299
as a successful HTTP response. Previously only200
status codes were accepted.- Updated the
text
key in our Entity Detection feature to return the proper noun rather than the possessive noun. For example,Andrew
instead ofAndrew’s
.
- Fixed an edge case with Entity Detection where under certain contexts, a disfluency could be identified as an entity.
January 17, 2022
Punctuation and Casing Accuracy Improved, Inverse Text Normalization Model Updated
- Released v4 of our Punctuation model, increasing punctuation and casing accuracy by ~2%.
- Updated our Inverse Text Normalization (ITN) model for our
/v2/transcript
endpoint, improving web address and email address formatting and fixing the occasional number formatting issue.
- Fixed an edge case where multi-channel files would return no text when the two channels were out of phase with each other.
January 10, 2022
Support for Non-English Languages Coming Soon
- Our Deep Learning team has been hard at work training our new non-English language models. In the coming weeks, we will be adding support for French, German, Italian, and Spanish.
January 3, 2022
Shorter Summaries Added to Auto Chapters, Improved Filler Word Detection
- Added a new
gist
key to the Auto Chapters feature. This new key provides an ultra-short, usually 3 to 8 word summary of the content spoken during that chapter.

- Implemented profanity filtering into Auto Chapters, which will prevent the API from generating a
summary
,headline
, orgist
that includes profanity. - Improved Filler Word (aka, disfluencies) detection by ~5%.
- Improved accuracy for Real-Time Streaming Transcription.
- Fixed an edge case where WebSocket connections for Real-Time Transcription sessions would occasionally not close properly after the session was terminated. This resulted in the client receiving a 4031 error code even after sending a session termination message.
- Corrected a bug that occasionally attributed disfluencies to the wrong utterance when Speaker Labels or Dual-Channel Transcription was enabled.
December 27, 2021
v8.5 Asynchronous Transcription Model Released
- Our Asynchronous Speech Recognition model is now even better with the release of v8.5.
- This update improves overall accuracy by 4% relative to our v8 model.
- This is achieved by improving the model’s ability to handle noisy or difficult-to-decipher audio.
- The v8.5 model also improves Inverse Text Normalization for numbers.
December 20, 2021
New and Improved API Documentation
- Launched the new AssemblyAI Docs, with more complete documentation and an easy-to-navigate interface so developers can effectively use and integrate with our API. Click here to view the new and improved documentation.
- Added two new fields to the
FinalTranscript
response for Real-time Transcriptions. Thepunctuated
key is a Boolean value indicating if punctuation was successful. Thetext_formatted
key is a Boolean value indicating if Inverse Text Normalization (ITN) was successful.
December 13, 2021
Inverse Text Normalization Added to Real-Time, Word Boost Accuracy Improved
- Inverse Text Normalization (ITN) added for our
/v2/realtime
and/v2/stream
endpoints. ITN improves formatting of entities like numbers, dates, and proper nouns in the transcription text.
- Improved accuracy for Custom Vocabulary (aka, Word Boosts) with the Real-Time transcription API.
- Fixed an edge case that would sometimes cause transcription errors when
disfluencies
was set totrue
and no words were identified in the audio file.
December 4, 2021
Entity Detection Released, Improved Filler Word Detection, Usage Alerts
- v1 release of Entity Detection - automatically detects a wide range of
entities
like person and company names, emails, addresses, dates, locations, events, and more. - To include Entity Detection in your transcript, set
entity_detection
totrue
in your POST request to/v2/transcript
. - When your transcript is complete, you will see an
entities
key towards the bottom of the JSON response containing the entities detected, as shown here:

- Read more about Entity Detection in our official documentation.
- Usage Alert feature added, allowing customers to set a monthly usage threshold on their account along with a list of email addresses to be notified when that monthly threshold has been exceeded. This feature can be enabled by clicking “Set up alerts” on the “Developers” tab in the Dashboard.

- When Content Safety is enabled, a summary of the severity scores detected will now be returned in the API response under the
severity_score_summary
nested inside of thecontent_safety_labels
key, as shown below.

- Improved Filler Word (aka, disfluencies) detection by ~25%.
- Fixed a bug in Auto Chapters that would occasionally add an extra space between sentences for headlines and summaries.
November 27, 2021
Additional MIME Type Detection Added for OPUS Files
- Added additional MIME type detection to detect a wider variety of OPUS files.
- Fixed an issue with word timing calculations that caused issues with speaker labeling for a small number of transcripts.
November 23, 2021
Custom Vocabulary Accuracy Significantly Improved
- Significantly improved the accuracy of Custom Vocabulary, and the impact of the
boost_param
field to control the weight for Custom Vocabulary. - Improved precision of word timings.
November 12, 2021
New Auto Chapters, Sentiment Analysis, and Disfluencies Features Released
- v1 release of Auto Chapters - which provides a "summary over time" by breaking audio/video files into "chapters" based on the topic of conversation. Check out our blog to read more about this new feature. To enable Auto Chapters in your request, you can set
auto_chapters: true
in your POST request to/v2/transcript
. - v1 release of Sentiment Analysis - that determines the sentiment of sentences in a transcript as
"positive"
,"negative"
, or"neutral"
. Sentiment Analysis can be enabled by including thesentiment_analysis: true
parameter in your POST request to/v2/transcript
. - Filler-words like
"um"
and"uh"
can now be included in the transcription text. Simply includedisfluencies: true
in your POST request to/v2/transcript
.
- Deployed Speaker Labels version 1.3.0. Improves overall diarization/labeling accuracy.
- Improved our internal auto-scaling for asynchronous transcription, to keep turnaround times consistently low during periods of high usage.
November 7, 2021
New Language Code Parameter for English Spelling
- Added a new
language_code
parameter when making requests to/v2/transcript
. - Developers can set this to
en_us
,en_uk
, anden_au
, which will ensure the correct English spelling is used - British English, Australian English, or US English (Default). - Quick note: for customers that were historically using the
assemblyai_en_au
orassemblyai_en_uk
acoustic models, thelanguage_code
parameter is essentially redundant and doesn't need to be used.

- Fixed an edge-case where some files with prolonged silences would occasionally have a single word predicted, such as "you" or "hi."
November 1, 2021
New Features Coming Soon, Bug Fixes
- This week, our engineering team has been hard at work preparing for the release of exciting new features like:
- Chapter Detection: Automatically summarize audio and video files into segments (aka "chapters").
- Sentiment Analysis: Determine the sentiment of sentences in your transcript as
"positive"
,"negative"
, or"neutral"
. - Disfluencies: Detects filler-words like
"um"
and"uh"
.
- Improved average real-time latency by 2.1% and p99 latency by 0.06%.
- Fixed an edge-case where confidence scores in the utterances category for dual-channel audio files would occasionally receive a confidence score greater than 1.0.
October 24, 2021
Improved v8 Model Processing Speed
- Improved the API's ability to handle audio/video files with a duration over 8 hours.
- Further improved transcription processing times by 12%.
- Fixed an edge case in our responses for dual channel audio files where if speaker 2 interrupted speaker 1, the text from speaker 2 would cause the text from speaker 1 to be split into multiple turns, rather than contextually keeping all of speaker 1's text together.
October 18, 2021
v8 Transcription Model Released
- Today, we're happy to announce the release of our most accurate Speech Recognition model for asynchronous transcription to date—version 8 (v8).
- This new model dramatically improves overall accuracy (up to 19% relative), and proper noun accuracy as well (up to 25% relative).
- You can read more about our v8 model in our blog here.
- Fixed an edge case where a small percentage of short (<60 seconds in length) dual-channel audio files, with the same audio on each channel, resulted in repeated words in the transcription.
October 11, 2021
v2 Real-Time and v4 Topic Detection Models Released
- Launched our v2 Real-Time Streaming Transcription model (read more on our blog).
- This new model improves accuracy of our Real-Time Streaming Transcription by ~10%.
- Launched our Topic Detection v4 model, with an accuracy boost of ~8.37% over v3 (read more on our blog).
October 3, 2021
v3 Topic Detection Model, PII Redaction Bug Fixes
- Released our v3 Topic Detection model.
- This model dramatically improves the Topic Detection feature's ability to accurately detect topics based on context.
- For example, in the following text, the model was able to accurately predict
"Rugby"
without the mention of the sport directly, due to the mention of"Ed Robinson"
(a Rugby coach).

- PII Redaction has been improved to better identify (and redact) phone numbers even when they are not explicitly referred to as a phone number.
- Released a fix for PII Redaction that corrects an issue where the model would sometimes detect phone numbers as credit card numbers or social security numbers.
September 26, 2021
Severity Scores for Content Safety
- The API now returns a severity score along with the
confidence
andlabel
keys when using the Content Safety feature. - The severity score measures how intense a detected Content Safety label is on a scale of 0 to 1.
- For example, a natural disaster that leads to mass casualties will have a score of
1.0
, while a small storm that breaks a mailbox will only be0.1
.

- Fixed an edge case where a small number of transcripts with Automatic Transcript Highlights turned on were not returning any results.