Whisper streaming
Supported languages
Afrikaans, Albanian, Amharic, Arabic, Armenian, Assamese, Azerbaijani, Bashkir, Basque, Belarusian, Bengali, Bosnian, Breton, Bulgarian, Cantonese, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Faroese, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Haitian Creole, Hausa, Hawaiian, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Lao, Latin, Latvian, Lingala, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Mongolian, Myanmar, Nepali, Norwegian, Nynorsk, Occitan, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Sanskrit, Serbian, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Spanish, Sundanese, Swahili, Swedish, Tagalog, Tajik, Tamil, Tatar, Telugu, Thai, Tibetan, Turkish, Turkmen, Ukrainian, Urdu, Uzbek, Vietnamese, Welsh, Yiddish, Yoruba
Whisper streaming allows you to transcribe audio streams in 99 languages using the WhisperLiveKit model.
Configuration
To utilize Whisper streaming, you need to include "speech_model":"whisper-rt" as a query parameter in the WebSocket URL.
The whisper-rt model does not support the language parameter. The
model automatically detects the language being spoken. Do not include a
language parameter when using this model.
Supported languages
Whisper streaming supports 99 languages:
Language codes
Language detection
The Whisper streaming model supports automatic language detection, allowing you to identify which language is being spoken in real-time. When enabled, the model returns the detected language code and confidence score with each complete utterance and final turn.
Configuration
To enable language detection, include language_detection=true as a query parameter in the WebSocket URL:
Output format
When language detection is enabled, each Turn message (with either a complete utterance or end_of_turn: true) will include two additional fields:
language_code: The language code of the detected language (e.g.,"es"for Spanish,"fr"for French)language_confidence: A confidence score between 0 and 1 indicating how confident the model is in the language detection
The language_code and language_confidence fields only appear when either:
- The
utterancefield is non-empty and contains a complete utterance - Theend_of_turnfield istrue
Example response
Here’s an example Turn message with language detection enabled, showing Spanish being detected:
In this example, the model detected Spanish ("es") with a confidence of 0.846999.
Non-speech tags
The Whisper streaming model can detect and transcribe non-speech audio events. These are returned as bracketed tags in the utterance field. Common non-speech tags include:
[Silence]- Periods of silence or no speech[Música]/[Music]- Background music detected- Other audio events may appear in similar bracketed format
Example response with non-speech
Here’s an example Turn message showing silence detection:
Non-speech tags appear in the utterance field with brackets. The
transcript field contains the raw text without formatting. You can filter
out non-speech turns by checking if the utterance contains bracketed tags
like [Silence] or [Music].
Understanding formatting
By default, the Whisper streaming model returns unformatted transcripts. To receive formatted transcripts with proper punctuation and capitalization, you must set format_turns=true as a query parameter.
Enabling format_turns adds additional latency to the transcription. We
recommend keeping it off for voice agents where low latency is critical,
and on for notetaking applications where formatted output is more
important than speed.
Configuration
To enable formatted transcripts, include format_turns=true in the WebSocket URL:
Example comparison
Here’s how the same Spanish phrase appears with and without formatting:
Unformatted (format_turns=false, default):
Formatted (format_turns=true):
When formatting is enabled, the transcript includes proper capitalization and punctuation.
Quickstart
Python
Python SDK
Javascript
JavaScript SDK
Firstly, install the required dependencies.
The Python example uses the websockets library. If you’re using websockets version 13.0 or later, use additional_headers parameter. For older versions (< 13.0), use extra_headers instead.