Route to Default Language if Language Confidence is Low | AssemblyAI

This guide will show you how to use AssemblyAI’s API to resubmit a request using a default language if the Automatic Language Detection’s language_confidence is below a certain threshold.

Getting started

Before we begin, make sure you have an AssemblyAI account and an API key. You can sign up for an AssemblyAI account and get your API key from your dashboard.

Step-by-step instructions

Install the SDK:

1 npm install assemblyai

Import the assemblyai package and set the API key.

1 import { AssemblyAI } from "assemblyai";
2 
3 const client = new AssemblyAI({
4   apiKey: "YOUR_API_KEY",
5 });

Define a default_language, which should be set to the language code that will be used to rerun the transcript if language detection runs with low language_confidence.

1 const default_language = "LANGUAGE_CODE";

Define an audio_url that is set to a link to the audio file. Define and set the parameters audio: audioUrl and language_detection: true. We also need to define our language_confidence_threshold. For the purposes of this example, we’ll set it to 0.8, representing 80% confidence.

If a transcript ends up with a language_confidence below this value, the transcript will error out and will return the transcript using the default_language.

1 const audioUrl = "https://example.org/audio.mp3";
2 
3 const params = {
4   audio: audioUrl,
5   language_detection: true,
6   language_confidence_threshold: 0.8,
7   // Add any other params
8 };

You can handle the error safely by checking the error message and rerunning the transcript with the language_code set to the default_language.

The error handling flow works as follows:

If there is no error, the transcript ID and text are printed
If there is an error:
- Check if it’s related to language_confidence being below threshold
- If so:
  - Print message about rerunning with default language
  - Create new transcript with default_language as the language_code
  - Print new transcript ID and text
- If not:
  - Print the error message

When rerunning with the default language, the configuration is updated to:

Turn off language_detection
Remove language_confidence_threshold
Set language_code to the default_language

You will not be charged for the first transcript if there is an error. You will only be charged for the transcript that processes successfully.

1 const run = async (params) => {
2   const transcript = await client.transcripts.transcribe(params);
3 
4   if (transcript.status === "error") {
5     if (
6       transcript.error.includes(
7         "below the requested confidence threshold value"
8       )
9     ) {
10       console.log(
11         `${transcript.error}. Running transcript again with language set to '${default_language}'.`
12       );
13       params = {
14         ...params,
15         language_detection: false,
16         language_confidence_threshold: null,
17         language_code: default_language,
18       };
19       run(params);
20       return;
21     }
22 
23     console.log(transcript.error);
24     return;
25   }
26 
27   console.log(`Transcript ID: ${transcript.id}`);
28   console.log(transcript.text);
29 };
30 
31 run(params);