Automatic Language Detection
Supported languages
enen_auen_uken_usesfrdeitptnlhijazhfikoplrutrukviafsqamarhyasazbaeubebnbsbrbgmycahrcsdaetfoglkaelguhthahawhehuisidjwknkkkmlolalvlnltlbmkmgmsmlmtmimrmnnenonnocpapsfarosasrsnsdsiskslsosuswsvtltgtatttethbotkuruzcyyiyoSupported models
universalSupported regions
US & EU
Identify the dominant language spoken in an audio file and use it during the transcription. Enable it to detect any of the supported languages.
To reliably identify the dominant language, a file must contain at least 15 seconds of spoken audio. Results will be improved if there is at least 15-90 seconds of spoken audio in the file.
Set a list of expected languages
If you’re confident the audio is in one of a few languages, provide that list via language_detection_options.expected_languages. Detection is restricted to these candidates and the model will choose the language with the highest confidence from this list. This can eliminate scenarios where Automatic Language Detection selects an unexpected language for transcription.
- Use our language codes (e.g.,
"en","es","fr"). - If
expected_languagesis not specified, it is set to["all"]by default.
Choose a fallback language
Control what language transcription should fall back to when detection cannot confidently select a language from the expected_languages list.
- Set
language_detection_options.fallback_languageto a specific language code (e.g.,"en"). fallback_languagemust be one of the language codes inexpected_languagesor"auto".- When
fallback_languageis unspecified, it is set to"auto"by default. This tells our model to choose the fallback language fromexpected_languageswith the highest confidence score.
Confidence score
If language detection is enabled, the API returns a confidence score for the detected language. The score ranges from 0.0 (low confidence) to 1.0 (high confidence).
Set a language confidence threshold
You can set the confidence threshold that must be reached if language detection is enabled. An error will be returned if the language confidence is below this threshold. Valid values are in the range [0,1] inclusive.
language_confidence_threshold you specify is not met you will receive an error message like detected language 'bg', confidence 0.2949, is below the requested confidence threshold value of '0.4'.