Route to Nano Speech Model if Detected Language Confidence is Low
This guide will show you how to use AssemblyAI’s API to resubmit a request to the Nano Speech Model if the Best Speech Model Automatic Language Detection’s language_confidence_threshold
isn’t met. As the Nano Speech Model supports 99 languages compared to 17 by the Best Speech Model, this workflow will route transcripts to the identified language code if it does not fit within our Best supported languages. The following code uses the Python SDK.
Get Started
Before we begin, make sure you have an AssemblyAI account and an API key. You can sign up for an AssemblyAI account and get your API key from your dashboard.
Step-by-Step Instructions
Install the SDK:
Import the assemblyai
package and set the API key.
Define a Transcriber
, an audio_url
set to a link to the audio file, and a TranscriptionConfig
with language_detection=True
. For this Cookbook, to emphasize the Speech Model selected, we will also specify speech_model="best"
. Finally, we need to define our language_confidence_threshold
. For the purposes of this example, we’ll set it to 0.8, representing 80% confidence.
If a transcript ends up with a language_confidence
below this value, the transcript will error out, and we’ll return the identified language that had the highest confidence. This is useful for cases where the language identified isn’t supported by our Best speech model, wherein confidence will be very low, but the language is supported via Nano, where you can programmatically route this file.
If your transcript errors out with a language_confidence
-related error, you can parse our error message to resubmit the file to Nano with the recommended language code. Since the original request to Best failed with an error, there is no cost for this process, and re-routing the file to Nano will transcribe the file at a lower cost than Best as well.
For the purposes of this guide, we’ll include an example of what this error message looks like so you can choose to build your own parsing logic should you wish.
The resulting transcript will now be in the language we identified with our ALD model, and will have been routed to the correct speech_model
with no human intervention needed and at a lower cost than having it incorrectly transcribed via Best without this form of error checking.