For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
PlaygroundChangelogSign In
OverviewAPI ReferencePre-recorded STTStreaming STTVoice AgentsSpeech UnderstandingGuardrailsLLM GatewayFAQ
OverviewAPI ReferencePre-recorded STTStreaming STTVoice AgentsSpeech UnderstandingGuardrailsLLM GatewayFAQ
  • Getting started
    • Transcribe a pre-recorded audio file
    • Model selection
    • View model benchmarks
    • Evaluate model accuracy
    • Cloud endpoints & data residency
    • Manage concurrent requests
    • Webhooks
  • Models
    • Medical Mode
  • Features
    • Boost specific terms
    • Label speakers
    • Transcribe multiple audio channels
    • Transcribe audio with mixed languages
    • Correct spelling of terms
    • Include filler words
    • Search for words in transcript
    • Set the start and end of the transcript
  • Guides
      • Build a meeting notetaker
      • Build a medical scribe
      • Build a contact center application
        • Use Automatic Language Detection as a Separate Step From Transcription
        • Route to Default Language if Language Confidence is Low
LogoLogo
PlaygroundChangelogSign In
On this page
  • Get started
  • Step-by-step instructions
GuidesTutorialsAutomatic Language Detection

Use Automatic Language Detection as a Separate Step From Transcription

Was this page helpful?
Previous

Route to Default Language if Language Confidence is Low

Next
Built with

In this guide, we’ll show you how to perform automatic language detection separately from the transcription process. For the transcription, the file then gets then routed to either our Universal-3 Pro or Universal-2 model class, depending on the supported language.

This workflow is designed to be cost-effective, slicing the first 60 seconds of audio and running it through Universal-2 ALD, which detects 99 languages, at a cost of $0.002 per transcript for this language detection workflow (not including the total transcription cost).

Get started

Before we begin, make sure you have an AssemblyAI account and an API key. You can sign up for a free account and get your API key from your dashboard.

Step-by-step instructions

Install the SDK:

$pip install assemblyai

Import the assemblyai package and set your API key:

1import assemblyai as aai
2aai.settings.api_key = "YOUR_API_KEY"

Create a set with all supported languages for Universal. You can find them in our documentation here.

1supported_languages_for_universal = {
2 "en",
3 "en_au",
4 "en_uk",
5 "en_us",
6 "es",
7 "fr",
8 "de",
9 "it",
10 "pt",
11 "nl",
12 "hi",
13 "ja",
14 "zh",
15 "fi",
16 "ko",
17 "pl",
18 "ru",
19 "tr",
20 "uk",
21 "vi",
22}

Define a Transcriber. Note that here we don’t pass in a global TranscriptionConfig, but later apply different ones during the transcribe() call.

1transcriber = aai.Transcriber()

Define two helper functions:

  • detect_language() performs language detection on the first 60 seconds of the audio and returns the language code.
  • transcribe_file() performs the transcription. For this, the identified language is applied and either Universal-3 Pro or Universal-2 is used depending on the supported language.
1def detect_language(audio_url):
2 config = aai.TranscriptionConfig(
3 audio_end_at=60000, # first 60 seconds (in milliseconds)
4 language_detection=True,
5 speech_models=["universal-2"],
6 )
7 transcript = transcriber.transcribe(audio_url, config=config)
8 return transcript.json_response["language_code"]
9
10def transcribe_file(audio_url, language_code):
11 config = aai.TranscriptionConfig(
12 language_code=language_code,
13 speech_models=(
14 ["universal-3-pro", "universal-2"]
15 if language_code in supported_languages_for_universal
16 else ["universal-2"]
17 ),
18 )
19 transcript = transcriber.transcribe(audio_url, config=config)
20 return transcript

Test the code with different audio files. For each file, we apply both helper functions sequentially to first identify the language and then transcribe the file.

1audio_urls = [
2 "https://storage.googleapis.com/aai-web-samples/public_benchmarking_portugese.mp3",
3 "https://storage.googleapis.com/aai-web-samples/public_benchmarking_spanish.mp3",
4 "https://storage.googleapis.com/aai-web-samples/slovenian_luka_doncic_interview.mp3",
5 "https://storage.googleapis.com/aai-web-samples/5_common_sports_injuries.mp3",
6]
7
8for audio_url in audio_urls:
9 language_code = detect_language(audio_url)
10 print("Identified language:", language_code)
11
12 transcript = transcribe_file(audio_url, language_code)
13 print("Transcript:", transcript.text[:100], "...")