For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
PlaygroundChangelogSign In
OverviewAPI ReferencePre-recorded STTStreaming STTVoice AgentsSpeech UnderstandingGuardrailsLLM GatewayFAQ
OverviewAPI ReferencePre-recorded STTStreaming STTVoice AgentsSpeech UnderstandingGuardrailsLLM GatewayFAQ
  • Overview
      • Am I charged for transcribing silent audio?
      • Are Custom Models More Accurate than General Models?
      • Do I Get Charged for Failed API Calls?
      • Are there any limits on file size or file duration for files submitted to the API?
      • Can I customize how words are spelled by the model?
      • Can I delete the transcripts I have created using the API?
      • Can I get a list of all transcripts I have created?
      • Can I send audio to AssemblyAI in segments and still get speaker labels for the whole recording?
      • Can I submit files to the API that are stored in a Google Drive?
      • Can I use the API without internet access?
      • Do we have resources for building with Make?
      • Do you have any examples for how to use your API?
      • Do you have example use cases for using AssemblyAI?
      • Do you offer cross-file Speaker Identification?
      • Do you offer translation?
      • Do you offer voice-to-voice or text-to-speech (TTS)?
      • Does it cost extra to export SRT or VTT captions?
      • Is there a way to generate SRT or VTT captions with speaker labels?
      • Does it cost more to transcribe an audio or video?
      • Does your API return timestamps for individual words?
      • How are individual speakers identified and how does the Speaker Label feature work?
      • How are paragraphs created for the /paragraphs endpoint?
      • How are word/transcript level confidence scores calculated?
      • How can I integrate AssemblyAI with other services?
      • How can I make certain words more likely to be transcribed?
      • How can I test AssemblyAI without writing code?
      • How can I transcribe YouTube videos?
      • How do I generate subtitles?
      • How does AssemblyAI compare to other ASR providers?
      • How does Automatic Language Detection work?
      • How does the API handle files that contain spoken audio in multiple languages?
      • How long does it take to transcribe a file?
      • What should I do if I'm getting an error?
      • Is there a Postman collection for using the API?
      • Is there a way for us to send the start time / end time for transcription instead of transcribing the whole length of a call recording?
      • Is there an OpenAPI spec/schema for the API?
      • read operation timed out" error
      • Should I use Speaker Labels or Multi-channel?
      • What are the recommended options for audio noise reduction?
      • What audio and video file types are supported by your API?
      • What IP Address Should I Whitelist for AssemblyAI?
      • What is the minimum audio duration that the API can transcribe?
      • What is the recommended file type for using your API?
      • What types of audio URLs can I use with the API?
      • Where can I find a list of recent changes to the API?
      • Where can I find cURL code examples?
      • Why can't I access recording URLs from the /upload endpoint directly?
LogoLogo
PlaygroundChangelogSign In
OverviewPre-recorded audio

How does Automatic Language Detection work?

Our Automatic Language Detection (ALD) model analyzes samples of the audio to determine the language spoken. It randomly selects up to 3 clips of 30 seconds each from the middle 50% of the audio duration (between 25% and 75% of the total length).

These 3 clips are passed through our ALD model, which predicts the language probabilities for each clip. The probabilities are then averaged across the clips, and the languages are sorted by their average probability scores.

This approach helps ensure that the language detection is based on a representative sample of the audio, rather than just the beginning or end portions which may contain greetings, silence, or other non-representative speech.

If you are seeing low confidence scores for transcriptions in a particular language, it may be due to factors like background noise, accents, or audio quality issues.

Was this page helpful?
Previous

How does the API handle files that contain spoken audio in multiple languages?

Next
Built with