Changelog

Follow along to see weekly accuracy and product improvements.

June 20, 2024

Variable-bitrate video support; bugfix

We've deployed changes which now permit variable-bitrate video files to be submitted to our API.

We've fixed a recent bug in which audio files with a large amount of silence at the beginning of the file would fail to transcribe.

June 19, 2024

LeMUR improvements

We have added two new keys to the LeMUR response, input_tokens and output_tokens, which can help users track usage.

We've implemented a new fallback system to further boost the reliability of LeMUR.

We have addressed an edge case issue affecting LeMUR and certain XML tags. In particular, when LeMUR responds with a <question> XML tag, it will now always close it with a </question> tag rather than erroneous tags which would sometimes be returned (e.g. </answer>).

June 18, 2024

PII Redaction and Entity Detection improvements

We've improved our PII Text Redaction and Entity Detection models, yielding more accurate detection and removal of PII and other entities from transcripts.

We've added 16 new entities, including vehicle_id and account_number, and updated 4 of our existing entities. Users may need to update to the latest version of our SDKs to use these new entities.

We've added PII Text Redaction and Entity Detection support in 4 new languages:

  • Chinese
  • Dutch
  • Japanese
  • Georgian

PII Text Redaction and Entity Detection now support a total of 47 languages between our Best and Nano tiers.

June 14, 2024

Usage and spend alerts

Users can now set up billing alerts in their user portals. Billing alerts notify you when your monthly spend or account balance reaches a threshold.

To set up a billing alert, go to the billing page of your portal, and click Set up a new alert under the Your alerts widget:

You can then set up an alert by specifying whether to alert on monthly spend or account balance, as well as the specific threshold at which to send an alert.

June 13, 2024

Universal-1 now available in German

Universal-1, our most powerful and accurate multilingual Speech-to-Text model, is now available in German.

No special action is needed to utilize Universal-1 on German audio - all requests sent to our /v2/transcript endpoint with German audio files will now use Universal-1 by default. Learn more about how to integrate Universal-1 into your apps in our Getting Started guides.

May 24, 2024

New languages for Speaker Diarization

Speaker Diarization is now available in five additional languages for both the Best and Nano tiers:

  • Chinese
  • Hindi
  • Japanese 
  • Korean 
  • Vietnamese
May 13, 2024

New API Reference, Timestamps improvements

We’ve released a new version of the API Reference section of our docs for an improved developer experience. Here’s what’s new:

  1. New API Reference pages with exhaustive endpoint documentation for transcription, LeMUR, and streaming
  2. cURL examples for every endpoint
  3. Interactive Playground: Test our API endpoints with the interactive playground. It includes a form-builder for generating requests and corresponding code examples in cURL, Python, and TypeScript
  4. Always up to date: The new API Reference is autogenerated based on our Open-Source OpenAPI and AsyncAPI specs

We’ve made improvements to Universal-1’s timestamps for both the Best and Nano tiers, yielding improved timestamp accuracy and a reduced incidence of overlapping timestamps.

We’ve fixed an issue in which users could receive an `Unable to create transcription. Developers have been alerted` error that would be surfaced when using long files with Sentiment Analysis.

May 3, 2024

New codec support; account deletion support

We’ve upgraded our transcoding library and now support the following new codecs:

  • Bonk, APAC, Mi-SC4, 100i, VQC, FTR PHM, WBMP, XMD ADPCM, WADY DPCM, CBD2 DPCM
  • HEVC, VP9, AV1 codec in enhanced flv format

Users can now delete their accounts by selecting the Delete account option on the Account page of their AssemblyAI Dashboards.

Users will now receive a 400 error when using an invalid tier and language code combination, with an error message such as The selected language_code is supported by the following speech_models: best, conformer-2. See https://www.assemblyai.com/docs/concepts/supported-languages..

We’ve fixed an issue in which nested JSON responses from LeMUR would cause Invalid LLM response, unable to fulfill request. Please try again. errors.

We’ve fixed a bug in which very long files would sometimes fail to transcribe, leading to timeout errors.

May 1, 2024

AssemblyAI app for Make.com

Make (formerly Integromat) is a no-code automation platform that makes it easy to build tasks and workflows that synthesize many different services.

We’ve released the AssemblyAI app for Make that allows Make users to incorporate AssemblyAI into their workflows, or scenarios. In other words, in Make you can now use our AI models to

  1. Transcribe audio data with speech recognition models
  2. Analyze audio data with audio intelligence models
  3. Build generative features on top of audio data with LLMs

For example, in our tutorial on Redacting PII with Make, we demonstrate how to build a Make scenario that automatically creates a redacted audio file and redacted transcription for any audio file uploaded to a Google Drive folder.

April 18, 2024

GDPR and PCI DSS compliance

AssemblyAI is now officially PCI Compliant. The Payment Card Industry Data Security Standard Requirements and Security Assessment Procedures (PCI DSS) certification is a rigorous assessment that ensures card holder data is being properly and securely handled and stored. You can read more about PCI DSS here.

Additionally, organizations which have signed an NDA can go to our Trust Portal in order to view our PCI attestation of compliance, as well as other security-related documents.

AssemblyAI is also GDPR Compliant. The General Data Protection Regulation (GDPR) is regulation regarding privacy and security for the European Union that applies to businesses that serve customers within the EU. You can read more about GDPR here.

Additionally, organizations which have signed an NDA can go to our Trust Portal in order to view our GDPR assessment on compliance, as well as other security-related documents.

April 18, 2024

Self-serve invoices; dual-channel improvement

Users of our API can now view and download their self-serve invoices in their dashboards under Billing > Your invoices.

We’ve made readability improvements to the formatting of utterances for dual-channel transcription by combining sequential utterances from the same channel.

We’ve added a patch to improve stability in turnaround times for our async transcription and LeMUR services.

We’ve fixed an issue in which timestamp accuracy would be degraded in certain edge cases when using our async transcription service.

April 11, 2024

Introducing Universal-1

Last week we released Universal-1, a state-of-the-art multimodal speech recognition model. Universal-1 is trained on 12.5M hours of multilingual audio data, yielding impressive performance across the four key languages for which it was trained - English, Spanish, German, and French.

Word Error Rate across four languages for several providers. Lower is better.

Universal-1 is now the default model for English and Spanish audio files sent to our v2/transcript endpoint for async processing, while German and French will be rolled out in the coming weeks.

You can read more about Universal-1 in our announcement blog or research blog, or you can try it out now on our Playground.

April 11, 2024

New Streaming STT features

We’ve added a new message type to our Streaming Speech-to-Text (STT) service. This new message type SessionInformation is sent immediately before the final SessionTerminated message when closing a Streaming session, and it contains a field called audio_duration_seconds which contains the total audio duration processed during the session. This feature allows customers to run end-user-specific billing calculations.

To enable this feature, set the enable_extra_session_information query parameter to true when connecting to a Streaming WebSocket.

endpoint_str = 'wss://api.assemblyai.com
/v2/realtime/ws?sample_rate=8000&enable_extra_session_information=true'

This feature will be rolled out in our SDKs soon.

We’ve added a new feature to our Streaming STT service, allowing users to disable Partial Transcripts in a Streaming session. Our Streaming API sends two types of transcripts - Partial Transcripts (unformatted and unpunctuated) that gradually build up the current utterance, and Final Transcripts which are sent when an utterance is complete, containing the entire utterance punctuated and formatted.

Users can now set the disable_partial_transcripts query parameter to true when connecting to a Streaming WebSocket to disable the sending of Partial Transcript messages.

endpoint_str = 'wss://api.assemblyai.com
/v2/realtime/ws?sample_rate=8000&disable_partial_transcripts=true'

This feature will be rolled out in our SDKs soon.

We have fixed a bug in our async transcription service, eliminating File does not appear to contain audio errors. Previously, this error would be surfaced in edge cases where our transcoding pipeline would not have enough resources to transcode a given file, thus failing due to resource starvation.

March 20, 2024

Dual channel transcription improvements

We’ve made improvements to how utterances are handled during dual-channel transcription. In particular, the transcription service now has elevated sensitivity when detecting utterances, leading to improved utterance insertions when there is overlapping speech on the two channels.

March 14, 2024

LeMUR concurrency fix

We’ve fixed a temporary issue in which users with low account balances would occasionally be rate-limited to a value less than 30 when using LeMUR.

March 6, 2024

Fewer "File does not appear to contain audio" errors

We’ve fixed an edge-case bug in our async API, leading to a significant reduction in errors that say File does not appear to contain audio. Users can expect to see an immediate reduction in this type of error. If this error does occur, users should retry their requests given that retries are generally successful.

We’ve made improvements to our transcription service autoscaling, leading to improved turnaround times for requests that use Word Boost when there is a spike in requests to our API.

February 27, 2024

New developer controls for real-time end-of-utterance

We have released developer controls for real-time end-of-utterance detection, providing developers control over when an utterance is considered complete. Developers can now either manually force the end of an utterance, or set a threshold for time of silence before an utterance is considered complete. 

We have made changes to our English async transcription service that improve sentence segmentation for our Sentiment Analysis, Topic Detection, and Content Moderation models. The improvements fix a bug in which these models would sometimes delineate sentences on titles that end in periods like Dr. and Mrs.

We have fixed an issue in which transcriptions of very long files (8h+) with disfluencies enabled would error out.

February 19, 2024

PII Redaction and Entity Detection available in 13 additional languages

We have launched PII Text Redaction and Entity Detection for 13 new languages:

  1. Spanish
  2. Finnish
  3. French
  4. German
  5. Hindi
  6. Italian
  7. Korean
  8. Polish
  9. Portuguese
  10. Russian
  11. Turkish
  12. Ukrainian
  13. Vietnamese

We have increased the memory of our transcoding service workers, leading to a significant reduction in errors that say File does not appear to contain audio.

February 6, 2024

Fewer LeMUR 500 errors

We’ve made improvements to our LeMUR service to reduce the number of 500 errors.

We’ve made improvements to our real-time service, which provides a small increase to the accuracy of timestamps in some edge cases.

January 18, 2024

Free tier limit increase; Real-time concurrency increase

We have increased the usage limit for our free tier to 100 hours. New users can now use our async API to transcribe up to 100 hours of audio, with a concurrency limit of 5, before needing to upgrade their accounts.

We have rolled out the concurrency limit increase for our real-time service. Users now have access to up to 100 concurrent streams by default when using our real-time service.

Higher concurrency is available upon request with no limit to what our API can support. If you need a higher concurrency limit, please either contact our Sales team or reach out to us at support@assemblyai.com. Note that our real-time service is only available for upgraded accounts.