Route to Nano Speech Model if Detected Language Confidence is Low | AssemblyAI

This guide will show you how to use AssemblyAI’s API to resubmit a request to the Nano Speech Model if the Universal Speech Model Automatic Language Detection’s language_confidence_threshold isn’t met. As the Nano Speech Model supports 99 languages compared to 17 by the Universal Speech Model, this workflow will route transcripts to the identified language code if it does not fit within our Universal supported languages. The following code uses the Python SDK.

Get Started

Before we begin, make sure you have an AssemblyAI account and an API key. You can sign up for an AssemblyAI account and get your API key from your dashboard.

Step-by-Step Instructions

Install the SDK:

$ pip install assemblyai

Import the assemblyai package and set the API key.

1 import assemblyai as aai
2 
3 aai.settings.api_key = "YOUR_API_KEY"

Define a Transcriber, an audio_url set to a link to the audio file, and a TranscriptionConfig with language_detection=True. For this Cookbook, to emphasize the Speech Model selected, we will also specify speech_model="universal". Finally, we need to define our language_confidence_threshold. For the purposes of this example, we’ll set it to 0.8, representing 80% confidence.

If a transcript ends up with a language_confidence below this value, the transcript will error out, and we’ll return the identified language that had the highest confidence. This is useful for cases where the language identified isn’t supported by our Universal speech model, wherein confidence will be very low, but the language is supported via Nano, where you can programmatically route this file.

1 transcriber = aai.Transcriber()
2 audio_url = ("https://example.org/audio.mp3")
3 
4 config = aai.TranscriptionConfig(speech_model=aai.SpeechModel.universal, language_detection=True, language_confidence_threshold=0.8)
5 transcript = transcriber.transcribe(audio_url, config)

If your transcript errors out with a language_confidence-related error, you can parse our error message to resubmit the file to Nano with the recommended language code. Since the original request to Universal failed with an error, there is no cost for this process, and re-routing the file to Nano will transcribe the file at a lower cost than Universal as well.

For the purposes of this guide, we’ll include an example of what this error message looks like so you can choose to build your own parsing logic should you wish.

1 import re
2 
3 error = "detected language 'sv', confidence 0.22, is below the requested confidence threshold value of 0.8"  # Example message. Normally you'd access this error via transcript.error.
4 
5 # Check if this is an ALD confidence threshold error.
6 if "below the requested confidence threshold value" in error:
7     match = re.search(r"detected language '(\w+)'", error)
8     detected_language = match.group(1)
9 
10     new_config = aai.TranscriptionConfig(speech_model=aai.SpeechModel.nano, language_code=detected_language)
11 
12     transcript = transcriber.transcribe(audio_url, new_config)
13 
14     print(transcript.text)

The resulting transcript will now be in the language we identified with our ALD model, and will have been routed to the correct speech_model with no human intervention needed and at a lower cost than having it incorrectly transcribed via Universal without this form of error checking.