Reliability improvements
We've made reliability improvements for Claude models in our LeMUR framework.
We've made adjustments to our infrastructure so that users should see fewer timeout errors when using our Nano tier with some languages.
Follow along to see weekly accuracy and product improvements.
We've made reliability improvements for Claude models in our LeMUR framework.
We've made adjustments to our infrastructure so that users should see fewer timeout errors when using our Nano tier with some languages.
We've released the AssemblyAI integration for the LiveKit Agents framework, allowing developers to use our Streaming Speech-to-Text model in their real-time LiveKit applications.
LiveKit is a powerful platform for building real-time audio and video applications. It abstracts away the complicated details of building real-time applications so developers can rapidly build and deploy applications for video conferencing, livestreaming, and more.
Check out our tutorial on How to build a LiveKit app with real-time Speech-to-Text to see how you can build a real-time transcription chat feature using the integration. You can browse all of our integrations on the Integrations page of our Docs.
We have renewed our SOC2 Type 2 certification, and expanded it to include Processing Integrity. Our SOC2 Type 2 certification now covers all five Trust Services Criteria (TSCs).
Our SOC2 Type 2 report is available in our Trust Center to organizations with an NDA.
We have obtained our inaugural ISO 27001:2022 certification, which is an internationally recognized standard for managing information security. It provides a systematic framework for protecting sensitive information through risk management, policies, and procedures.
Our ISO 27001:2022 report is available in our Trust Center to organizations with an NDA.
We've improved our timestamp algorithm, yielding higher accuracy for long numerical strings like credit card numbers, phone numbers, etc.
We've released a fix for no-space languages like Japanese and Chinese. While transcripts for these languages correctly contain no spaces in responses from our API, the text
attribute of the utterances
key previously contained spaces. These extraneous spaces have been removed.
We've improved Universal-2's formatting for punctuation, lowering the likelihood of consecutive punctuation characters such as ?'
.
We now offer multichannel transcription, allowing users to transcribe files with up to 32 separate audio channels, making speaker identification easier in situations like virtual meetings.
You can enable multichannel transcription via the `multichannel` parameter when making API requests. Here's how you can do it with our Python SDK:
import assemblyai as aai
aai.settings.api_key = "YOUR_API_KEY"
audio_file = "path/to/your/file.mp3"
config = aai.TranscriptionConfig(multichannel=True)
transcriber = aai.Transcriber(config=config)
transcript = transcriber.transcribe(audio_url)
print(transcript.json_response["audio_channels"])
print(transcript.utterances)
You can learn more about multichannel transcription in our Docs.
Last week we released Universal-2, our latest Speech-to-Text model. Universal-2 builds upon our previous model Universal-1 to make significant improvements in "last mile" challenges critical to real-world use cases - proper nouns, formatting, and alphanumerics.
Comparison of error rates for Universal-2 vs Universal-1 across overall performance (Standard ASR) and four last-mile areas, each measured by the appropriate metric
Universal-2 is now the default model for English files sent to our `v2/transcript` endpoint for async processing. You can read more about Universal-2 in our announcement blog or research blog, or you can try it out now on our Playground.
The following models were removed from LeMUR: anthropic/claude-instant-1-2
and basic
(legacy, equivalent to anthropic/claude-instant-1-2
), which will now return a 400 validation error if called.
These models were removed due to Anthropic sunsetting legacy models in favor of newer models which are more performant, faster, and cheaper. We recommend users who were using the removed models switch to Claude 3 Haiku (anthropic/claude-3-haiku
).
We recently observed a degradation in accuracy when transcribing French files through our API. We have since pushed a bugfix to restore performance to prior levels.
We've improved error messaging for greater clarity for both our file download service and Invalid LLM response
errors from LeMUR.
We've released a fix to ensure that rate limit headers are always returned from LeMUR requests, and not just 200
and 400
responses.
Check out our quarterly wrap-up for a summary of the new features and integrations we launched this quarter, as well as improvements we made to existing models and functionality.
Claude 3 in LeMUR
We added support for Claude 3 in LeMUR, allowing users to prompt the following LLMs in relation to their transcripts:
Check out our related blog post to learn more.
Automatic Language Detection
We made significant improvements to our Automatic Language Detection (ALD) Model, supporting 10 new languages for a total of 17, with best in-class accuracy in 15 of those 17 languages. We also added a customizable confidence threshold for ALD.
Learn more about these improvements in our announcement post.
We released the AssemblyAI Ruby SDK and the AssemblyAI C# SDK, allowing Ruby and C# developers to easily add SpeechAI to their applications with AssemblyAI. The SDKs let developers use our asynchronous Speech-to-Text and Audio Intelligence models, as well as LeMUR through a simple interface.
Learn more in our Ruby SDK announcement post and our C# SDK announcement post.
This quarter, we shipped two new integrations:
Activepieces š¤ AssemblyAI
The AssemblyAI integration for Activepieces allows no-code and low-code builders to incorporate AssemblyAI's powerful SpeechAI in Activepieces automations. Learn how to use AssemblyAI in Activepieces in our Docs.
Langflow š¤ AssemblyAI
We've released the AssemblyAI integration for Langflow, allowing users to build with AssemblyAI in Langflow - a popular open-source, low-code app builder for RAG and multi-agent AI applications. Check out the Langflow docs to learn how to use AssemblyAI in Langflow.
Assembly Required
This quarter we launched Assembly Required - a series of candid conversations with AI founders sharing insights, learnings, and the highs and lows of building a company.
Click here to check out the first conversation in the series, between Edo Liberty, founder and CEO of Pinecone, and Dylan Fox, founder and CEO of AssemblyAI.
We released the AssemblyAI API Postman Collection, which provides a convenient way for Postman users to try our API, featuring endpoints for Speech-to-Text, Audio Intelligence, LeMUR, and Streaming for you to use. Similar to our API reference, the Postman collection also provides example responses so you can quickly browse endpoint results.
Free offer improvements
This quarter, we improved our free offer with:
We released 36 new blogs this quarter, from tutorials to projects to technical deep dives. Here are some of the blogs we released this quarter:
We also released 10 new YouTube videos, demonstrating how to build SpeechAI applications and more, including:
We also made improvements to a range of other features, including:
And more!
We can't wait for you to see what we have in store to close out the year š
Recently, Anthropic announced that they will be deprecating legacy LLM models that are usable via LeMUR. We will therefore be sunsetting these models in advance of Anthropic's end-of-life for them:
You will receive API errors rejecting your LeMUR requests if you attempt to use either of the above models after the sunset dates. Users who have used these models recently have been alerted via email with notice to select an alternative model to use via LeMUR.
We have a number of newer models to choose from, which are not only more performant but also ~50% more cost-effective than the legacy models.
Check out our docs to learn how to select which model you use via LeMUR.
We've released the AssemblyAI integration for Langflow, allowing low-code builders to incorporate Speech AI into their workflows.
Langflow is a popular open-source, low-code app builder for RAG and multi-agent AI applications. Using Langflow, you can easily connect different components via drag and drop and build your AI flow. Check out the Langflow docs for AssemblyAI's integration here to learn more.
We've fixed an edge-case issue that would cause requests using Speaker Labels to fail for some files.
We've released the AssemblyAI integration for Activepieces, allowing no-code and low-code builders to incorporate Speech AI into their workflows.
Activepieces is an open-source, no-code automation platform that allows users to build workflows that connect various applications. Now, you can use AssemblyAI's powerful models to transcribe speech, analyze audio, and build generative features in Activepieces.
Read more about how you can use AssemblyAI in Activepieces in our Docs.
We've fixed an edge-case which would sometimes occur due to language fallback when Automatic Language Detection (ALD) was used in conjunction with language_confidence_threshold
, resulting in executed transcriptions that violated the user-set language_confidence_threshold
. Now such transcriptions will not execute, and instead return an error to the user.
We've made improvements to our Automatic Language Detection (ALD) model, yielding increased accuracy, expanded language support, and customizable confidence thresholds.
In particular, we have added support for 10 new languages, including Chinese, Finnish, and Hindi, to support a total of 17 languages in our Best tier. Additionally, we've achieved best in-class accuracy in 15 of those 17 languages when benchmarked against four leading providers.
Finally, we've added a customizable confidence threshold for ALD, allowing you to set a minimum confidence threshold for the detected language and be alerted if this threshold is not satisfied.
Read more about these recent improvements in our announcement post.
We've made a series of improvements to our Free Offer:
Learn more about our Free Offer on our Pricing page, and then check out our Quickstart in our Docs to get started.
We've made improvements to our Speaker Diarization model, especially robustness in distinguishing between speakers with similar voices.
We've fixed an error in which the last word in a transcript was always attributed to the same speaker as the second-to-last word.
We've made improvements to error handling for file uploads that fail. Now if there is an error, such as a file containing no audio, the following 422 error will be returned:
Upload failed, please try again. If you continue to have issues please reach out to support@assemblyai.com
We've made scaling improvements that reduce p90 latency for some non-English languages when using the Best tier
We've made improvements to notifications for auto-refill failures. Now, users will be alerted more rapidly when their automatic payments are unsuccessful.
Last month, we announced support for Claude 3 in LeMUR. Today, we are adding support for two new endpoints - Question & Answer and Summary (in addition to the pre-existing Task endpoint) - for these newest models:
Here's how you can use Claude 3.5 Sonnet to summarize a virtual meeting with LeMUR:
import assemblyai as aai
aai.settings.api_key = "YOUR-KEY-HERE"
audio_url = "https://storage.googleapis.com/aai-web-samples/meeting.mp4"
transcript = aai.Transcriber().transcribe(audio_url)
result = transcript.lemur.summarize(
final_model=aai.LemurModel.claude3_5_sonnet,
context="A GitLab meeting to discuss logistics",
answer_format="TLDR"
)
print(result.response)
Learn more about these specialized endpoints and how to use them in our Docs.
We've launched our Zapier integration v2.0, which makes it easy to use our API in a no-code way. The enhanced app is more flexible, supports more Speech AI features, and integrates more closely into the Zap editor.
The Transcribe
event (formerly Get Transcript
) now supports all of the options available in our transcript API, making all of our Speech Recognition and Audio Intelligence features available to Zapier users, including asynchronous transcription. In addition, we've added 5 new events to the AssemblyAI app for Zapier:
Get Transcript
: Retrieve a transcript that you have previously created.Get Transcript Subtitles
: Generate STT or VTT subtitles for the transcript.Get Transcript Paragraphs
: Retrieve the transcript segmented into paragraphs.Get Transcript Sentences
: Retrieve the transcript segmented into sentences.Get Transcript Redacted Audio Result
: Retrieve the result of the PII audio redaction model. The result contains the status and the URL to the redacted audio file.Read more about how to use the new app in our Docs, or check out our tutorial to see how you can generate subtitles with Zapier and AssemblyAI.
LeMUR can now be used from browsers, either via our JavaScript SDK or fetch
.
Last week, we released Anthropic's Claude 3 model family into LeMUR, our LLM framework for speech.
You can now easily apply any of these models to your audio data. Learn more about how to get started in our docs or try out the new models in a no-code way through our playground.
For more information, check out our blog post about the release.
import assemblyai as aai
# Step 1: Transcribe an audio file
transcriber = aai.Transcriber()
transcript = transcriber.transcribe("./common_sports_injuries.mp3")
# Step 2: Define a prompt
prompt = "Provide a brief summary of the transcript."
# Step 3: Choose an LLM to use with LeMUR
result = transcript.lemur.task(
prompt,
final_model=aai.LemurModel.claude3_5_sonnet
)
print(result.response)
We've fixed an issue which was causing the JavaScript SDK to surface the following error when using the SDK in the browser:
Access to fetch at 'https://api.assemblyai.com/v2/transcript' from origin 'https://exampleurl.com' has been blocked by CORS policy: Request header field assemblyai-agent is not allowed by Access-Control-Allow-Headers in preflight response.
We've made significant improvements to the timestamp accuracy of our Speech-to-Text Best tier for English, Spanish, and German. 96% of timestamps are accurate within 200ms, and 86% of timestamps are now accurate within 100ms.
We've fixed a bug in which confidence scores of transcribed words for the Nano tier would sometimes be outside of the range [0, 1]
We've fixed a rare issue in which the speech for only one channel in a short dual channel file would be transcribed when disfluencies
was also enabled.
We've made model improvements that significantly improve the accuracy of timestamps when using our Streaming Speech-to-Text service. Most timestamps are now accurate within 100 ms.
Our Streaming Speech-to-Text service will now return a new error 'Audio too small to be transcoded'
(code 4034
) when a client submits an audio chunk that is too small to be transcoded (less than 10 ms).
We've deployed changes which now permit variable-bitrate video files to be submitted to our API.
We've fixed a recent bug in which audio files with a large amount of silence at the beginning of the file would fail to transcribe.
We have added two new keys to the LeMUR response, input_tokens
and output_tokens
, which can help users track usage.
We've implemented a new fallback system to further boost the reliability of LeMUR.
We have addressed an edge case issue affecting LeMUR and certain XML tags. In particular, when LeMUR responds with a <question>
XML tag, it will now always close it with a </question>
tag rather than erroneous tags which would sometimes be returned (e.g. </answer>
).
We've improved our PII Text Redaction and Entity Detection models, yielding more accurate detection and removal of PII and other entities from transcripts.
We've added 16 new entities, including vehicle_id
and account_number
, and updated 4 of our existing entities. Users may need to update to the latest version of our SDKs to use these new entities.
We've added PII Text Redaction and Entity Detection support in 4 new languages:
PII Text Redaction and Entity Detection now support a total of 47 languages between our Best and Nano tiers.
Users can now set up billing alerts in their user portals. Billing alerts notify you when your monthly spend or account balance reaches a threshold.
To set up a billing alert, go to the billing page of your portal, and click Set up a new alert
under the Your alerts
widget:
You can then set up an alert by specifying whether to alert on monthly spend or account balance, as well as the specific threshold at which to send an alert.
Universal-1, our most powerful and accurate multilingual Speech-to-Text model, is now available in German.
No special action is needed to utilize Universal-1 on German audio - all requests sent to our /v2/transcript
endpoint with German audio files will now use Universal-1 by default. Learn more about how to integrate Universal-1 into your apps in our Getting Started guides.
Speaker Diarization is now available in five additional languages for both the Best and Nano tiers:
Weāve released a new version of the API Reference section of our docs for an improved developer experience. Hereās whatās new:
Weāve made improvements to Universal-1ās timestamps for both the Best and Nano tiers, yielding improved timestamp accuracy and a reduced incidence of overlapping timestamps.
Weāve fixed an issue in which users could receive an `Unable to create transcription. Developers have been alerted` error that would be surfaced when using long files with Sentiment Analysis.
Weāve upgraded our transcoding library and now support the following new codecs:
Bonk
, APAC
, Mi-SC4
, 100i
, VQC
, FTR PHM
, WBMP
, XMD ADPCM
, WADY DPCM
, CBD2 DPCM
HEVC
, VP9
, AV1
codec in enhanced flv
formatUsers can now delete their accounts by selecting the Delete account
option on the Account page of their AssemblyAI Dashboards.
Users will now receive a 400 error when using an invalid tier and language code combination, with an error message such as The selected language_code is supported by the following speech_models: best, conformer-2. See https://www.assemblyai.com/docs/concepts/supported-languages.
.
Weāve fixed an issue in which nested JSON responses from LeMUR would cause Invalid LLM response, unable to fulfill request. Please try again.
errors.
Weāve fixed a bug in which very long files would sometimes fail to transcribe, leading to timeout errors.
Make (formerly Integromat) is a no-code automation platform that makes it easy to build tasks and workflows that synthesize many different services.
Weāve released the AssemblyAI app for Make that allows Make users to incorporate AssemblyAI into their workflows, or scenarios. In other words, in Make you can now use our AI models to
For example, in our tutorial on Redacting PII with Make, we demonstrate how to build a Make scenario that automatically creates a redacted audio file and redacted transcription for any audio file uploaded to a Google Drive folder.
AssemblyAI is now officially PCI Compliant. The Payment Card Industry Data Security Standard Requirements and Security Assessment Procedures (PCI DSS) certification is a rigorous assessment that ensures card holder data is being properly and securely handled and stored. You can read more about PCI DSS here.
Additionally, organizations which have signed an NDA can go to our Trust Portal in order to view our PCI attestation of compliance, as well as other security-related documents.
AssemblyAI is also GDPR Compliant. The General Data Protection Regulation (GDPR) is regulation regarding privacy and security for the European Union that applies to businesses that serve customers within the EU. You can read more about GDPR here.
Additionally, organizations which have signed an NDA can go to our Trust Portal in order to view our GDPR assessment on compliance, as well as other security-related documents.
Users of our API can now view and download their self-serve invoices in their dashboards under Billing > Your invoices
.
Weāve made readability improvements to the formatting of utterances for dual-channel transcription by combining sequential utterances from the same channel.
Weāve added a patch to improve stability in turnaround times for our async transcription and LeMUR services.
Weāve fixed an issue in which timestamp accuracy would be degraded in certain edge cases when using our async transcription service.
Last week we released Universal-1, a state-of-the-art multimodal speech recognition model. Universal-1 is trained on 12.5M hours of multilingual audio data, yielding impressive performance across the four key languages for which it was trained - English, Spanish, German, and French.
Universal-1 is now the default model for English and Spanish audio files sent to our v2/transcript
endpoint for async processing, while German and French will be rolled out in the coming weeks.
You can read more about Universal-1 in our announcement blog or research blog, or you can try it out now on our Playground.
Weāve added a new message type to our Streaming Speech-to-Text (STT) service. This new message type SessionInformation
is sent immediately before the final SessionTerminated
message when closing a Streaming session, and it contains a field called audio_duration_seconds
which contains the total audio duration processed during the session. This feature allows customers to run end-user-specific billing calculations.
To enable this feature, set the enable_extra_session_information
query parameter to true
when connecting to a Streaming WebSocket.
endpoint_str = 'wss://api.assemblyai.com
/v2/realtime/ws?sample_rate=8000&enable_extra_session_information=true'
This feature will be rolled out in our SDKs soon.
Weāve added a new feature to our Streaming STT service, allowing users to disable Partial Transcripts in a Streaming session. Our Streaming API sends two types of transcripts - Partial Transcripts (unformatted and unpunctuated) that gradually build up the current utterance, and Final Transcripts which are sent when an utterance is complete, containing the entire utterance punctuated and formatted.
Users can now set the disable_partial_transcripts
query parameter to true
when connecting to a Streaming WebSocket to disable the sending of Partial Transcript messages.
endpoint_str = 'wss://api.assemblyai.com
/v2/realtime/ws?sample_rate=8000&disable_partial_transcripts=true'
This feature will be rolled out in our SDKs soon.
We have fixed a bug in our async transcription service, eliminating File does not appear to contain audio
errors. Previously, this error would be surfaced in edge cases where our transcoding pipeline would not have enough resources to transcode a given file, thus failing due to resource starvation.
Weāve made improvements to how utterances are handled during dual-channel transcription. In particular, the transcription service now has elevated sensitivity when detecting utterances, leading to improved utterance insertions when there is overlapping speech on the two channels.
Weāve fixed a temporary issue in which users with low account balances would occasionally be rate-limited to a value less than 30 when using LeMUR.
Weāve fixed an edge-case bug in our async API, leading to a significant reduction in errors that say File does not appear to contain audio
. Users can expect to see an immediate reduction in this type of error. If this error does occur, users should retry their requests given that retries are generally successful.
Weāve made improvements to our transcription service autoscaling, leading to improved turnaround times for requests that use Word Boost when there is a spike in requests to our API.
We have released developer controls for real-time end-of-utterance detection, providing developers control over when an utterance is considered complete. Developers can now either manually force the end of an utterance, or set a threshold for time of silence before an utterance is considered complete.
We have made changes to our English async transcription service that improve sentence segmentation for our Sentiment Analysis, Topic Detection, and Content Moderation models. The improvements fix a bug in which these models would sometimes delineate sentences on titles that end in periods like Dr.
and Mrs.
.
We have fixed an issue in which transcriptions of very long files (8h+) with disfluencies enabled would error out.
We have launched PII Text Redaction and Entity Detection for 13 new languages:
We have increased the memory of our transcoding service workers, leading to a significant reduction in errors that say File does not appear to contain audio
.
Weāve made improvements to our LeMUR service to reduce the number of 500 errors.
Weāve made improvements to our real-time service, which provides a small increase to the accuracy of timestamps in some edge cases.
We have increased the usage limit for our free tier to 100 hours. New users can now use our async API to transcribe up to 100 hours of audio, with a concurrency limit of 5, before needing to upgrade their accounts.
We have rolled out the concurrency limit increase for our real-time service. Users now have access to up to 100 concurrent streams by default when using our real-time service.
Higher concurrency is available upon request with no limit to what our API can support. If you need a higher concurrency limit, please either contact our Sales team or reach out to us at support@assemblyai.com. Note that our real-time service is only available for upgraded accounts.
We introduced major improvements to our APIās inference latency, with the majority of audio files now completing in well under 45 seconds regardless of audio duration, with a Real-Time Factor (RTF) of up to .008.
To put an RTF of .008x into perspective, this means you can now convert a:
In addition to these latency improvements, we have reduced our Speech-to-Text pricing. You can now access our Speech AI models with the following pricing:
Weāve also reduced our pricing for the following Audio Intelligence models: Key Phrases, Sentiment Analysis, Summarization, PII Audio Redaction, PII Redaction, Auto Chapters, Entity Detection, Content Moderation, and Topic Detection. You can view the complete list of pricing updates on our Pricing page.
Finally, we've increased the default concurrency limits for both our async and real-time services. The increase is immediate for async, and will be rolled out soon for real-time. These new limits are now:
These new changes stem from the efficiencies that our incredible research and engineering teams drive at every level of our inference pipeline, including optimized model compilation, intelligent mini batching, hardware parallelization, and optimized serving infrastructure.
Learn more about these changes and our inference pipeline in our blog post.
Anthropicās Claude 2.1 is now generally available through LeMUR. Claude 2.1 is similar to our Default model and has reduced hallucinations, a larger context window, and performs better in citations.
Claude 2.1 can be used by setting the final_model
parameter to anthropic/claude-2-1
in API requests to LeMUR. Here's an example of how to do this through our Python SDK:
import assemblyai as aai
transcriber = aai.Transcriber()
transcript = transcriber.transcribe("https://example.org/customer.mp3")
result = transcript.lemur.task(
"Summarize the following transcript in three to five sentences.",
final_model=aai.LemurModel.claude2_1,
)
print(result.response)
You can learn more about setting the model used with LeMUR in our docs.
Our real-time service now supports binary mode for sending audio segments. Users no longer need to encode audio segments as base64 sequences inside of JSON objects - the raw binary audio segment can now be directly sent to our API.
Moving forward, sending audio segments through websockets via the audio_data
field is considered a deprecated functionality, although it remains the default for now to avoid breaking changes. We plan to support the audio_data
field until 2025.
If you are using our SDKs, no changes are required on your end.
We have fixed a bug that would yield a degradation to timestamp accuracy at the end of very long files with many disfluencies.
Weāve released v4 of our Node JavaScript SDK. Previously, the SDK was developed specifically for Node, but the latest version now works in additional runtimes without any extra steps. The SDK can now be used in the browser, Deno, Bun, Cloudflare Workers, etc.
Check out the SDKās GitHub repository for additional information.
Weāve released new Punctuation and Truecasing models, achieving significant improvements for acronyms, mixed-case words, and more.
Below is a visual comparison between our previous Punctuation Restoration and Truecasing models (red) and the new models (green):
Going forward, the new Punctuation Restoration and Truecasing models will automatically be used for async and real-time transcriptions, with no need to upgrade for special access. Use the parameters punctuate
and format_text
, respectively, to enable/disable the models in a request (enabled by default).
Read more about our new models here.
Our real-time transcription service now supports PCM Mu-law, an encoding used primarily in the telephony industry. This encoding is set by using the `encoding` parameter in requests to our API. You can read more about our PCM Mu-law support here.
We have improved internal reporting for our transcription service, which will allow us to better monitor traffic.
Users can now directly pass in custom text inputs into LeMUR through the input_text
parameter as an alternative to transcript IDs. This gives users the ability to use any information from the async API, formatted however they want, with LeMUR for maximum flexibility.
For example, users can assign action items per user by inputting speaker-labeled transcripts, or pull citations by inputting timestamped transcripts. Learn more about the new input_text
parameter in our LeMUR API reference, or check out examples of how to use the input_text
parameter in the AssemblyAI Cookbook.
Weāve made improvements that reduce hallucinations which sometimes occurred from transcribing hold music on phone calls. This improvement is effective immediately with no changes required by users.
Weāve fixed an issue that would sometimes yield an inability to fulfill a request when XML was returned by LeMUR /task
endpoint.
Weāve made improvements to our file downloading pipeline which reduce transcription latency. Latency has been reduced by at least 3 seconds for all audio files, with greater improvements for large audio files provided via external URLs.
Weāve improved error messaging for increased clarity in the case of internal server errors.
We have released the beta for our new usage dashboard. You can now see a usage summary broken down by async transcription, real-time transcription, Audio Intelligence, and LeMUR. Additionally, you can see charts of usage over time broken down by model.
We have added support for AWS marketplace on the dashboard/account management pages of our web application.
We have fixed an issue in which LeMUR would sometimes fail when handling extremely short transcripts.
We have added a new parameter to LeMUR that allows users to specify a temperature
for LeMUR generation. Temperature refers to how stochastic the generated text is and can be a value from 0 to 1, inclusive, where 0 corresponds to low creativity and 1 corresponds to high creativity. Lower values are preferred for tasks like multiple choice, and higher values are preferred for tasks like coming up with creative summaries of clips for social media.
Here is an example of how to set the temperature
parameter with our Python SDK (which is available in version 0.18.0
and up):
import assemblyai as aai
aai.settings.api_key = f"{API_TOKEN}"
transcriber = aai.Transcriber()
transcript = transcriber.transcribe("https://storage.googleapis.com/aai-web-samples/meeting.mp4")
result = transcript.lemur.summarize(
temperature=0.25
)
print(result.response)
We have added a new endpoint that allows users to delete the data for a previously submitted LeMUR request. The response data as well as any context provided in the original request will be removed. Continuing the example from above, we can see how to delete LeMUR data using our Python SDK:
request_id = result.request_id
deletion_result = aai.Lemur.purge_request_data(request_id)
print(deletion_result)
We have improved the error messaging for our Word Search functionality. Each phrase used in a Word Search functionality must be 5 words or fewer. We have improved the clarity of the error message when a user makes a request which contains a phrase that exceeds this limit.
We have fixed an edge case error that would occur when both disfluencies and Auto Chapters were enabled for audio files that contained non-fluent English.
We have improved logging for our LeMUR service to allow for the surfacing of more detailed errors to users.
We have increased observability into our Speech API internally, allowing for finer grained metrics of usage.
We have fixed a minor bug that would sometimes lead to incorrect timestamps for zero-confidence words.
We have fixed an issue in which requests to LeMUR would occasionally hang during peak usage due to a memory leak issue.
We have recently launched Speaker Labels for 10 additional languages:
We have unbundled and lowered the price for our Audio Intelligence models. Previously, the bundled price for all Audio Intelligence models was $2.10/hr, regardless of the number of models used.
We have made each model accessible at a lower, unbundled, per-model rate:
We now support the following additional languages for asynchronous transcription through our /v2/transcript
endpoint:
Additionally, we've made improvements in accuracy and quality to the following languages:
You can see a full list of supported languages and features here. You can see how to specify a language in your API request here. Note that not all languages support Automatic Language Detection.
We have decreased the price of Core Transcription from $0.90 per hour to $0.65 per hour, and decreased the price of Real-Time Transcription from $0.90 per hour to $0.75 per hour.
Both decreases were effective as of August 3rd.
Weāve implemented changes that yield between a 43% to 200% increase in processing speed for our Summarization models, depending on which model is selected, with no measurable impact on the quality of results.
We have standardized the response from our API for automatically detected languages that do not support requested features. In particular, when Automatic Language Detection is used and the detected language does not support a feature requested in the transcript request, our API will return null
in the response for that feature.
We've released LeMUR - our framework for applying LLMs to spoken data - for general availability. LeMUR is optimized for high accuracy on specific tasks:
Additionally, LeMUR can be applied to groups of transcripts in order to simultaneously analyze a set of files at once, allowing users to, for example, summarize many podcast episode or ask questions about a series of customer calls.
Our Python SDK allows users to work with LeMUR in just a few lines of code:
# version 0.15 or greater
import assemblyai as aai
# set your API key
aai.settings.api_key = f"{API_TOKEN}"
# transcribe the audio file (meeting recording)
transcriber = aai.Transcriber()
transcript = transcriber.transcribe("https://storage.googleapis.com/aai-web-samples/meeting.mp4")
# generate and print action items
result = transcript.lemur.action_items(
context="A GitLab meeting to discuss logistics",
answer_format="**<topic header>**\n<relevant action items>\n",
)
print(result.response)
Learn more about LeMUR in our blog post, or jump straight into the code in our associated Colab notebook.
We've released Conformer-2, our latest AI model for automatic speech recognition. Conformer-2 is trained on 1.1M hours of English audio data, extending Conformer-1 to provide improvements on proper nouns, alphanumerics, and robustness to noise.
Conformer-2 is now the default model for all English audio files sent to the v2/transcript
endpoint for async processing and introduces no breaking changes.
Weāll be releasing Conformer-2 for real-time English transcriptions within the next few weeks.
Read our full blog post about Conformer-2 here. You can also try it out in our Playground.
Weāve introduced a new, optional speech_threshold
parameter, allowing users to only transcribe files that contain at least a specified percentage of spoken audio, represented as a ratio in the range [0, 1]
.
You can use the speech_threshold
parameter with our Python SDK as below:
import assemblyai as aai
aai.settings.api_key = f"{ASSEMBLYAI_API_KEY}"
config = aai.TranscriptionConfig(speech_threshold=0.1)
file_url = "https://github.com/AssemblyAI-Examples/audio-examples/raw/main/20230607_me_canadian_wildfires.mp3"
transcriber = aai.Transcriber()
transcript = transcriber.transcribe(file_url, config)
print(transcript.text)
Smoke from hundreds of wildfires in Canada is triggering air quality alerts throughout the US. Skylines from ...
If the percentage of speech in the audio file does not meet or surpass the provided threshold, then the value of transcript.text
will be None
and you will receive an error:
if not transcript.text:
print(transcript.error)
Audio speech threshold 0.9461 is below the requested speech threshold value 1.0
As usual, you can also include the speech_threshold
parameter in the JSON of raw HTTP requests for any language.
Weāve fixed a bug in which timestamps could sometimes be incorrectly reported for our Topic Detection and Content Safety models.
Weāve made improvements to detect and remove a hallucination that would sometimes occur with specific audio patterns.
Weāve fixed an issue in which the last character in an alphanumeric sequence could fail to be transcribed. The fix is effective immediately and constitutes a 95% reduction in errors of this type.
Weāve fixed an issue in which consecutive identical numbers in a long number sequence could fail to be transcribed. This fix is effective immediately and constitutes a 66% reduction in errors of this type.
Weāve made improvements to the Speaker Labels model, adjusting the impact of the speakers_expected
parameter to better allow the model to determine the correct number of unique speakers, especially in cases where one or more speakers talks substantially less than others.
Weāve expanded our caching system to include additional third-party resources to help further ensure our continued operations in the event of external resources being down.
Weāve made significant improvements to our transcoding pipeline, resulting in a 98% overall speedup in transcoding time and a 12% overall improvement in processing time for our asynchronous API.
Weāve implemented a caching system for some third-party resources to ensure our continued operations in the event of external resources being down.
Weāre introducing our new framework LeMUR, which makes it simple to apply Large Language Models (LLMs) to transcripts of audio files up to 10 hours in length.
LLMs unlock a range of impressive capabilities that allow teams to build powerful Generative AI features. However, building these features is difficult due to the limited context windows of modern LLMs, among other challenges that necessitate the development of complicated processing pipelines.
LeMUR circumvents this problem by making it easy to apply LLMs to transcribed speech, meaning that product teams can focus on building differentiating Generative AI features rather than focusing on building infrastructure. Learn more about what LeMUR can do and how it works in our announcement blog, or jump straight to trying LeMUR in our Playground.
Weāve upgraded to a new and more accurate PII Redaction model, which improves credit card detections in particular.
Weāve made stability improvements regarding the handling and caching of web requests. These improvements additionally fix a rare issue with punctuation detection.
Weāve fixed two edge cases in our async transcription pipeline that were producing non-deterministic results from multilingual and stereo audio.
Weāve improved word boundary detection in our Japanese automatic speech recognition model. These changes are effective immediately for all Japanese audio files submitted to AssemblyAI.
Weāve implemented a range of improvements to our English pipeline, leading to an average 38% improvement in overall latency for asynchronous English transcriptions.
Weāve made improvements to our password reset process, offering greater clarity to users attempting to reset their passwords while still ensuring security throughout the reset process.
We're excited to announce that our new Conformer-1 Speech Recognition model is now available for real-time English transcriptions, offering a 24.3% relative accuracy improvement.
Effective immediately, this state-of-the-art model will be the default model for all English audio data sent to the wss://api.assemblyai.com/v2/realtime/ws
WebSocket API.
The Speaker Labels model now accepts a new optional parameter called speakers_expected
. If you have high confidence in the number of speakers in an audio file, then you can specify it with speakers_expected
in order to improve Speaker Labels performance, particularly for short utterances.
TLS 1.3 is now available for use with the AssemblyAI API. Using TLS 1.3 can decrease latency when establishing a connection to the API.
Our PII redaction scaling has been improved to increase stability, particularly when processing longer files.
We've improved the quality and accuracy of our Japanese model.
Short transcripts that are unable to be summarized will now return an empty summary and a successful transcript.
We've released our new Conformer-1 model for speech recognition. Conformer-1 was trained on 650K hours of audio data and is our most accurate model to date.
Conformer-1 is now the default model for all English audio files sent to the /v2/transcript
endpoint for async processing.
We'll be releasing it for real-time English transcriptions within the next two weeks, and will add support for more languages soon.
Our Content Safety and Topic Detection models are now available for use with Italian audio files.
Weāve made improvements to our Japanese punctuation model, increasing relative accuracy by 11%. These changes are effective immediately for all Japanese audio files submitted to AssemblyAI.
Weāve made improvements to our Hindi punctuation model, increasing relative accuracy by 26%. These changes are effective immediately for all Hindi audio files submitted to AssemblyAI.
Weāve tuned our production infrastructure to reduce latency and improve overall consistency when using the Topic Detection and Content Moderation models.
Weāve released a new version of our PII Redaction model to improve PII detection accuracy, especially for credit card and phone number edge cases. Improvements are effective immediately for all API calls that include PII redaction.
Weāve released a new version of our Automatic Language Detection model that better targets speech-dense parts of audio files, yielding improved accuracy. Additionally, support for dual-channel and low-volume files has been improved. All changes are effective immediately.
Our Core Transcription API has been migrated from EC2 to ECS in order to ensure scalable, reliable service and preemptively protect against service interruptions.
Users can now reset their passwords from our web UI. From the Dashboard login, simply click āForgot your password?ā to initiate a password reset. Alternatively, users who are already logged in can change their passwords from the Account tab on the Dashboard.
The maximum phrase length for our Word Search feature has been increased from 2 to 5, effective immediately.
Weāve made updates to our Conversational Summarization model to support dual-channel files. Effective immediately, dual_channel
may be set to True
when summary_model
is set to conversational
.
We've made significant improvements to timestamps for non-English audio. Timestamps are now typically accurate between 0 and 100 milliseconds. This improvement is effective immediately for all non-English audio files submitted to AssemblyAI for transcription.
Weāve made updates to our Core Transcription model to improve the transcription accuracy of phone numbers by 10%. This improvement is effective immediately for all audio files submitted to AssemblyAI for transcription.
We've improved scaling for our read-only database, resulting in improved performance for read-only requests.
We are happy to announce the release of our most accurate Speech Recognition model to date - version 9 (v9). This updated model delivers increased performance across many metrics on a wide range of audio types.
Word Error Rate, or WER, is the primary quantitative metric by which the performance of an automatic transcription model is measured. Our new v9 model shows significant improvements across a range of different audio types, as seen in the chart below, with a more than 11% improvement on average.
In addition to standard overall WER advancements, the new v9 model shows marked improvements with respect to proper nouns. In the chart below, we can see the relative performance increase of v9 over v8 for various types of audio, with a nearly 15% improvement on average.
The new v9 transcription model is currently live in production. This means that customers will see improved performance with no changes required on their end. The new model will automatically be used for all transcriptions created by our /v2/transcript
endpoint going forward, with no need to upgrade for special access.
While our customers enjoy the elevated performance of the v9 model, our AI research team is already hard at work on our v10 model, which is slated to launch in early 2023. Building upon v9, the v10 model is expected to radically improve the state of the art in speech recognition.
Try our new v9 transcription model through your browser using the AssemblyAI Playground. Alternatively, sign up for a free API token to test it out through our API, or schedule a time with our AI experts to learn more.
We are excited to announce that new Summarization models are now available! Developers can now choose between multiple summary models that best fit their use case and customize the output based on the summary length.
The new models are:
Developers can use the summary_model
parameter in their POST
request to specify which of our summary models they would like to use. This new parameter can be used along with the existing summary_type
parameter to allow the developer to customize the summary to their needs.
import requests
endpoint = "https://api.assemblyai.com/v2/transcript"
json = {
"audio_url": "https://bit.ly/3qDXLG8",
"summarization": True,
"summary_model": "informative", # conversational | catchy
"summary_type": "bullets" # bullets_verbose | gist | headline | paragraph
}
headers = {
"authorization": "YOUR-API-TOKEN",
"content-type": "application/json"
}
response = requests.post(endpoint, json=json, headers=headers)
print(response.json())
Check out our latest blog post to learn more about the new Summarization models or head to the AssemblyAI Playground to test Summarization in your browser!
Weāve made updates to our Core Transcription model to improve the transcription accuracy of the word COVID
. This improvement is effective immediately for all audio files submitted to AssemblyAI for transcription.
Static IP support for webhooks is now generally available!
Outgoing webhook requests sent from AssemblyAI will now originate from a static IP address 44.238.19.20
, rather than a dynamic IP address. This gives you the ability to easily validate that the source of the incoming request is coming from our server. Optionally, you can choose to whitelist this static IP address to add an additional layer of security to your system.
See our walkthrough on how to start receiving webhooks for your transcriptions.
import requests
endpoint = "https://api.assemblyai.com/v2/transcript"
json = {
"audio_url": "https://bit.ly/3qDXLG8",
"summarization": True,
"summary_type": "bullets" # paragraph | headline | gist
}
headers = {
"authorization": "YOUR-API-TOKEN",
"content-type": "application/json"
}
response = requests.post(endpoint, json=json, headers=headers)
print(response.json())
Starting today, you can now transcribe and summarize entire audio files with a single API call.
To enable our new Summarization models, include the following parameter: "summarization": true
in your POST request to /v2/transcript
. When the transcription finishes, you will see the summary
key in the JSON response containing the summary of your transcribed audio or video file.
By default, summaries will be returned in the style of bullet points. You can customize the style of summary by including the optional summary_type
parameter in your POST request along with one of the following values: paragraph
, headline
, or gist
. Here is the full list of summary types we support.
// summary_type = "paragraph"
"summary": "Josh Seiden and Brian Donohue discuss the
topic of outcome versus output on Inside Intercom.
Josh Seiden is a product consultant and author who has
just released a book called Outcomes Over Output.
Brian is product management director and he's looking
forward to the chat."
// summary_type = "headline"
"summary": "Josh Seiden and Brian Donohue discuss the
topic of outcomes versus output."
// summary_type = "gist"
"summary": "Outcomes over output"
// summary_type = = "bullets"
"summary": "Josh Seiden and Brian Donohue discuss
the topic of outcome versus output on Inside Intercom.
Josh Seiden is a product consultant and author who has
just released a book called Outcomes Over Output.
Brian is product management director and he's looking
forward to the chat.\n- ..."
Examples of use cases for Summarization include:
We're really excited to see what you build with our new Summarization models. To get started, try it out for free in our no-code playground or visit our documentation for more info on how to enable Summarization in your API requests.
Weāve improved our Automatic Casing model and fixed a minor bug that caused over-capitalization in English transcripts. The Automatic Casing model is enabled by default with our Core Transcription API to improve transcript readability for video captions (SRT/VTT). See our documentation for more info on Automatic Casing.
Our Core Transcription model has been fine-tuned to better detect short utterances in English transcripts. Examples of short utterances include one-word answers such as āNo.ā and āRight.ā This update will take effect immediately for all customers.
Over the next few weeks, we will begin rolling out Static IP support for webhooks to customers in stages.
Outgoing webhook requests sent from AssemblyAI will now originate from a static IP address 44.238.19.20
, rather than a dynamic IP address. This gives you the ability to easily validate that the source of the incoming request is coming from our server. Optionally, you can choose to whitelist this static IP address to add an additional layer of security to your system.
See our walkthrough on how to start receiving webhooks for your transcriptions.
Weāve made improvements to our Core Transcription model to better identify and transcribe numbers present in your audio files.
Accurate number transcription is critical for customers that need to redact Personally Identifiable Information (PII) that gets exchanged during phone calls. Examples of PII include credit card numbers, addresses, phone numbers, and social security numbers.
In order to help you handle sensitive user data at scale, our PII Redaction model automatically detects and removes sensitive info from transcriptions. For example, when PII redaction is enabled, a phone number like 412-412-4124
would become ###-###-####
.
To learn more, check out our blog that covers all of our PII Redaction Policies or try our PII Redaction model in our Sandbox here!
We've updated our Disfluency Detection model to improve the accuracy of timestamps for disfluency words.
By default, disfluencies such as "um" or "uh" and "hm" are automatically excluded from transcripts. However, we allow customers to include these filler words by simply setting the disfluencies
parameter to true
in their POST request to /v2/transcript
, which enables our Disfluency Detection model.
More info and code examples can be found here.
We've improved the Speaker Label modelās ability to identify unique speakers for single word or short utterances.
We've fixed a bug with the Historical Transcript endpoint that was causing null
to appear as the value of the completed
key.
Today, weāre releasing our new Japanese transcription model to help you transcribe and analyze your Japanese audio and video files using our cutting-edge AI.
Now you can automatically convert any Japanese audio or video file to text by including "language_code": "ja"
in your POST request to our /v2/transcript
endpoint.
In conjunction with transcription, weāve also added Japanese support for our AI models including Custom Vocabulary (Word Boost), Custom Spelling, Automatic Punctuation / Casing, Profanity Filtering, and more. This means you can boost transcription accuracy with more granularity based on your use case. See the full list of supported models available for Japanese transcriptions here.
To get started, visit our walkthrough on Specifying a Language on our AssemblyAI documentation page or try it out now in our Sandbox!
Weāve released our new Hindi transcription model to help you transcribe and analyze your Hindi audio and video files.
Now you can automatically convert any Hindi audio or video file to text by including "language_code": "hi"
in your POST request to our /v2/transcript
endpoint.
Weāve also added Hindi support for our AI models including Custom Vocabulary (Word Boost), Custom Spelling, Automatic Punctuation / Casing, Profanity Filtering, and more. See the full list of supported models available for Hindi transcriptions here.
To get started with Hindi transcription, visit our walkthrough on Specifying a Language on our AssemblyAI documentation page.
Our Webhook service now supports the use of Custom Headers for authentication.
A Custom Header can be used for added security to authenticate webhook requests from AssemblyAI. This feature allows a developer to optionally provide a value to be used as an authorization header on the returning webhook from AssemblyAI, giving the ability to validate incoming webhook requests.
To use a Custom Header, you will include two additional parameters in your POST request to /v2/transcript
: webhook_auth_header_name
and webhook_auth_header_value
. The webhook_auth_header_name
parameter accepts a string containing the header's name which will be inserted into the webhook request. The webhook_auth_header_value
parameter accepts a string with the value of the header that will be inserted into the webhook request. See our Using Webhooks documentation to learn more and view our code examples.
NULL
as the language_code
value./v2/transcript
endpoint. See our documentation for more information on specifying a language in your POST
request.credit_card_number
, credit_card_expiration
, and credit_card_cvv
policies in our PII Redaction feature.disfluencies
was set to true
.POST
request.disfluencies
was set to true
.drivers_license
and banking_information
.POST
request./v2/transcript
endpoint. This feature can identify the dominant language thatās spoken in an audio file and route the file to the appropriate model for the detected language."CS 50"
to "CS50"
.headline
and gist
generation and quote formatting in the summary
key.en_uk
and en_au
language codes.CREDIT_CARD_CVV
and LOCATION
.Server error, developers have been alerted
message. This feature is enabled by default. To disable it, visit the Account tab in your Developer Dashboard.summary
key.gist
key in the Auto Chapters feature.summary
, headline
, and gist
keys now include better punctuation, casing, and text formatting.POST
requests from the API to webhook URLs will now accept any status code from 200
to 299
as a successful HTTP response. Previously only 200
status codes were accepted.text
key in our Entity Detection feature to return the proper noun rather than the possessive noun. For example, Andrew
instead of Andrewās
./v2/transcript
endpoint, improving web address and email address formatting and fixing the occasional number formatting issue.gist
key to the Auto Chapters feature. This new key provides an ultra-short, usually 3 to 8 word summary of the content spoken during that chapter.summary
, headline
, or gist
that includes profanity.FinalTranscript
response for Real-time Transcriptions. The punctuated
key is a Boolean value indicating if punctuation was successful. The text_formatted
key is a Boolean value indicating if Inverse Text Normalization (ITN) was successful./v2/realtime
and /v2/stream
endpoints. ITN improves formatting of entities like numbers, dates, and proper nouns in the transcription text.disfluencies
was set to true
and no words were identified in the audio file.entities
like person and company names, emails, addresses, dates, locations, events, and more.entity_detection
to true
in your POST request to /v2/transcript
.entities
key towards the bottom of the JSON response containing the entities detected, as shown here:severity_score_summary
nested inside of the content_safety_labels
Ā key, as shown below. boost_param
field to control the weight for Custom Vocabulary.auto_chapters: true
in your POST request to /v2/transcript
."positive"
, "negative"
, or "neutral"
. Sentiment Analysis can be enabled by including the sentiment_analysis: true
parameter in your POST request to /v2/transcript
."um"
and "uh"
can now be included in the transcription text. Simply include disfluencies: true
in your POST request to /v2/transcript
.language_code
parameter when making requests to /v2/transcript
.en_us
, en_uk
, and en_au
, which will ensure the correct English spelling is used - British English, Australian English, or US English (Default).assemblyai_en_au
or assemblyai_en_uk
acoustic models, the language_code
parameter is essentially redundant and doesn't need to be used."positive"
, "negative"
, or "neutral"
."um"
and "uh"
."Rugby"
without the mention of the sport directly, due to the mention of "Ed Robinson"
(a Rugby coach).confidence
and label
keys when using the Content Safety feature.1.0
, while a small storm that breaks a mailbox will only be 0.1
.Client sent audio too fast
error from the Real-Time Streaming WebSocket API./v2/stream
endpoint, where large periods of silence would occasionally cause automatic punctuation to fail./v2/transcript
endpoint./v2/transcript
) and real-time (WebSocket API) transcripts.This week, we released an entirely new dashboard for developers:
The new developer dashboard introduces:
dollar amount
and date
tokens were not being properly redacted.Today we've released a major improvement to our ITN (Inverse Text Normalization) model. This results in better formatting for entities within the transcription, such as phone numbers, money amounts, and dates.
For example:
Today we've released an updated Automatic Punctuation and Casing Restoration model (Punctuation v2.5)! This update results in improved capitalization of proper nouns in transcripts, reduces over-capitalization issues where some words like were being incorrectly capitalized, and improves some edge cases around words with commas around them. For example:
We have released an updated Content Safety Model - v7! Performance for 10 out of all 19 Content Safety labels has been improved, with the biggest improvements being for the Profanity and Natural Disasters labels.
Developers will now be able to use the word_boost
parameter in requests to the real-time API, allowing you to introduce your own custom vocabulary to the model for that given session! This custom vocabulary will lead to improved accuracy for the provided words.
We will now be limiting one websocket connection per real-time session to ensure the integrity of a customer's transcription and prevent multiple users/clients from using the websocket same session.
Note: Developers can still have multiple real-time sessions open in parallel, up to the Concurrency Limit on the account. For example, if an account has a Concurrency Limit of 32, that account could have up to 32 concurrent real-time sessions open.
Today we have released v2 of our Topic Detection Model. This new model will predict multiple topics for each paragraph of text, whereas v1 was limited to predicting a single. For example, given the text:
"Elon Musk just released a new Tesla that drives itself!"
Automotive>AutoType>DriverlessCars: 1
Automotive>AutoType>DriverlessCars: 1
PopCulture : 0.84
PopCulture>CelebrityStyle: 0.56
This improvement will result in the visual output looking significantly better, and containing more informative responses for developers!
In this minor improvement, we have increased the number of topics the model can return in the summary
key of the JSON response from 10 to 20.
Often times, developers will need to expose their AssemblyAI API Key in their client applications when establishing connections with our real-time streaming transcription API. Now, developers can create a temporary API token that expires in a customizable amount of time (similar to an AWS S3 Temporary Authorization URL) that can safely be exposed in the client applications and front-ends.
This will allow developers to create short-lived API tokens designed to be used securely in the browser, along with authorization within the query string!
For example, authenticating in the query parameters with a temporary token would look like so:
wss://api.assemblyai.com/v2/realtime/ws?sample_rate=16000&token={TEMP_TOKEN}
For more information, you can view our Docs!
In this minor update, we improve the accuracy across all Content Safety labels, and add two new labels for better content categorization. The two new labels are sensitive_social_issues
and marijuana
.
New label definitions:
sensitive_social_issues
: This category includes content that may be considered insensitive, irresponsible, or harmful to specific groups based on their beliefs, political affiliation, sexual orientation, or gender identity.marijuana
: This category includes content that discusses marijuana or its usage.We are pleased to announce the official release of our Real-Time Streaming Transcription API! This API uses WebSockets and a fast Conformer Neural Network architecture that allows for a quick and accurate transcription in real-time.
Today we have released two of our enterprise-level models, Content Safety Detection and Topic Detection, to all users!
Now any developer can make use of these cutting edge models within their applications and products. Explore these new features in our Docs:
With this minor update, our Redaction Model will better detect Social Security Numbers and Medical References for additional security and data protection!
Today we released a new punctuation model that is more extensive than its predecessor, and will drive improvements in punctuation and casing accuracy!
You can explore each feature further in our Docs:
We have released an update to our PII Redaction Model that will now support detecting and redacting additional classes!
blood_type
medical_condition
drug
(including vitamins/minerals)injury
medical_process
blood_type
: Blood typemedical_condition
: A medical condition. Includes diseases, syndromes, deficits, disorders. E.g., chronic fatigue syndrome, arrhythmia, depression.drug
: Medical drug, including vitamins and minerals. E.g., Advil, Acetaminophen, Panadolinjury
: Human injury, e.g., I broke my arm, I have a sprained wrist. Includes mutations, miscarriages, and dislocations.medical_process
: Medical process, including treatments, procedures, and tests. E.g., "heart surgery," "CT scan."