Route to Nano Speech Model if Detected Language Confidence is Low

This guide will show you how to use AssemblyAI’s API to resubmit a request to the Nano Speech Model if the Best Speech Model Automatic Language Detection’s language_confidence_threshold isn’t met. As the Nano Speech Model supports 99 languages compared to 17 by the Best Speech Model, this workflow will route transcripts to the identified language code if it does not fit within our Best supported languages. The following code uses the Python SDK.

Get Started

Before we begin, make sure you have an AssemblyAI account and an API key. You can sign up for an AssemblyAI account and get your API key from your dashboard.

Step-by-Step Instructions

Install the SDK:

$pip install assemblyai

Import the assemblyai package and set the API key.

1import assemblyai as aai
2
3aai.settings.api_key = "YOUR_API_KEY"

Define a Transcriber, an audio_url set to a link to the audio file, and a TranscriptionConfig with language_detection=True. For this Cookbook, to emphasize the Speech Model selected, we will also specify speech_model="best". Finally, we need to define our language_confidence_threshold. For the purposes of this example, we’ll set it to 0.8, representing 80% confidence.

If a transcript ends up with a language_confidence below this value, the transcript will error out, and we’ll return the identified language that had the highest confidence. This is useful for cases where the language identified isn’t supported by our Best speech model, wherein confidence will be very low, but the language is supported via Nano, where you can programmatically route this file.

1transcriber = aai.Transcriber()
2audio_url = ("https://example.org/audio.mp3")
3
4config = aai.TranscriptionConfig(speech_model=aai.SpeechModel.best, language_detection=True, language_confidence_threshold=0.8)
5transcript = transcriber.transcribe(audio_url, config)

If your transcript errors out with a language_confidence-related error, you can parse our error message to resubmit the file to Nano with the recommended language code. Since the original request to Best failed with an error, there is no cost for this process, and re-routing the file to Nano will transcribe the file at a lower cost than Best as well.

For the purposes of this guide, we’ll include an example of what this error message looks like so you can choose to build your own parsing logic should you wish.

1import re
2
3error = "detected language 'sv', confidence 0.22, is below the requested confidence threshold value of 0.8" # Example message. Normally you'd access this error via transcript.error.
4
5# Check if this is an ALD confidence threshold error.
6if "below the requested confidence threshold value" in error:
7 match = re.search(r"detected language '(\w+)'", error)
8 detected_language = match.group(1)
9
10 new_config = aai.TranscriptionConfig(speech_model=aai.SpeechModel.nano, language_code=detected_language)
11
12 transcript = transcriber.transcribe(audio_url, new_config)
13
14 print(transcript.text)

The resulting transcript will now be in the language we identified with our ALD model, and will have been routed to the correct speech_model with no human intervention needed and at a lower cost than having it incorrectly transcribed via Best without this form of error checking.