Multilingual streaming | AssemblyAI

Supported languages

English, Spanish, French, German, Italian, and Portuguese

Multilingual streaming allows you to transcribe audio streams in multiple languages.

Configuration

To utilize multilingual streaming, you need to include "speech_model":"universal-streaming-multilingual" as a query parameter in the WebSocket URL.

Supported languages

Multilingual currently supports: English, Spanish, French, German, Italian, and Portuguese.

Quickstart

Python SDK

Python

JavaScript SDK

Javascript

Firstly, install the required dependencies.

$ pip install assemblyai

Python SDK

Python

Javascript

JavaScript SDK

1 import logging
2 from typing import Type
3 
4 import assemblyai as aai
5 from assemblyai.streaming.v3 import (
6     BeginEvent,
7     StreamingClient,
8     StreamingClientOptions,
9     StreamingError,
10     StreamingEvents,
11     StreamingParameters,
12     TerminationEvent,
13     TurnEvent,
14 )
15 
16 api_key = "<YOUR_API_KEY>"
17 
18 logging.basicConfig(level=logging.INFO)
19 logger = logging.getLogger(__name__)
20 
21 
22 def on_begin(self: Type[StreamingClient], event: BeginEvent):
23     print(f"Connecting websocket to url")
24     print(f"Session started: {event.id}")
25     print(f"Receiving SessionBegins ...")
26     print(f"Sending messages ...")
27 
28 
29 def on_turn(self: Type[StreamingClient], event: TurnEvent):
30     if not event.end_of_turn and event.transcript:
31         print(f"[PARTIAL TURN TRANSCRIPT]: {event.transcript}")
32     if event.utterance:
33         print(f"[PARTIAL TURN UTTERANCE]: {event.utterance}")
34         # Display language detection info if available
35         if event.language_code:
36             print(f"[UTTERANCE LANGUAGE DETECTION]: {event.language_code} - {event.language_confidence:.2%}")
37     if event.end_of_turn:
38         print(f"[FULL TURN TRANSCRIPT]: {event.transcript}")
39         # Display language detection info if available
40         if event.language_code:
41             print(f"[END OF TURN LANGUAGE DETECTION]: {event.language_code} - {event.language_confidence:.2%}")
42 
43 
44 def on_terminated(self: Type[StreamingClient], event: TerminationEvent):
45     print(
46         f"Session terminated: {event.audio_duration_seconds} seconds of audio processed"
47     )
48 
49 
50 def on_error(self: Type[StreamingClient], error: StreamingError):
51     print(f"Error occurred: {error}")
52 
53 
54 def main():
55     client = StreamingClient(
56         StreamingClientOptions(
57             api_key=api_key,
58             api_host="streaming.assemblyai.com",
59         )
60     )
61 
62     client.on(StreamingEvents.Begin, on_begin)
63     client.on(StreamingEvents.Turn, on_turn)
64     client.on(StreamingEvents.Termination, on_terminated)
65     client.on(StreamingEvents.Error, on_error)
66 
67     client.connect(
68         StreamingParameters(
69             sample_rate=48000,
70             speech_model="universal-streaming-multilingual",
71             language_detection=True,
72         )
73     )
74 
75     try:
76         client.stream(
77           aai.extras.MicrophoneStream(sample_rate=48000)
78         )
79     finally:
80         client.disconnect(terminate=True)
81 
82 
83 if __name__ == "__main__":
84     main()

Language detection

The multilingual streaming model supports automatic language detection, allowing you to identify which language is being spoken in real-time. When enabled, the model returns the detected language code and confidence score with each complete utterance and final turn.

Configuration

To enable language detection, include language_detection=true as a query parameter in the WebSocket URL:

wss://streaming.assemblyai.com/v3/ws?sample_rate=16000&speech_model=universal-streaming-multilingual&language_detection=true

Output format

When language detection is enabled, each Turn message (with either a complete utterance or end_of_turn: true) will include two additional fields:

language_code: The language code of the detected language (e.g., "es" for Spanish, "fr" for French)
language_confidence: A confidence score between 0 and 1 indicating how confident the model is in the language detection

The language_code and language_confidence fields only appear when either:

The utterance field is non-empty and contains a complete utterance - The end_of_turn field is true

Example response

Here’s an example Turn message with language detection enabled, showing Spanish being detected:

1 {
2   "turn_order": 1,
3   "turn_is_formatted": false,
4   "end_of_turn": false,
5   "transcript": "Buenos",
6   "end_of_turn_confidence": 0.991195,
7   "words": [
8     {
9       "start": 29920,
10       "end": 30080,
11       "text": "Buenos",
12       "confidence": 0.979445,
13       "word_is_final": true
14     },
15     {
16       "start": 30320,
17       "end": 30400,
18       "text": "días",
19       "confidence": 0.774696,
20       "word_is_final": false
21     }
22   ],
23   "utterance": "Buenos días.",
24   "language_code": "es",
25   "language_confidence": 0.999997,
26   "type": "Turn"
27 }

In this example, the model detected Spanish ("es") with a confidence of 0.999997.

Understanding formatting

The multilingual model produces transcripts with punctuation and capitalization already built into the model outputs. This means you’ll receive properly formatted text without requiring any additional post-processing.

While the API still returns the turn_is_formatted parameter to maintain interface consistency with other streaming models, the multilingual model doesn’t perform additional formatting operations. All transcripts from the multilingual model are already formatted as they’re generated.

In the future, this built-in formatting capability will be extended to our English-only streaming model as well.