Streaming Migration Guide: Universal Streaming to Universal-3 Pro Streaming | AssemblyAI

This guide walks through the process of upgrading from Universal Streaming to Universal-3 Pro Streaming for real-time audio transcription.

Get Started

Before we begin, make sure you have an AssemblyAI account and an API key. You can sign up for a free account and get your API key from your dashboard.

Quick upgrade

If you’re already using Universal Streaming, you can quickly test Universal-3 Pro Streaming by switching the speech_model parameter to "u3-rt-pro" and removing format_turns (formatting is always on in U3 Pro). Just update the connection params and start streaming.

1 # Before (Universal Streaming)
2 CONNECTION_PARAMS = {
3     "sample_rate": 16000,
4     "format_turns": True,
5 }
6 
7 # After (Universal-3 Pro Streaming)
8 CONNECTION_PARAMS = {
9     "sample_rate": 16000,
10     "speech_model": "u3-rt-pro",
11 }

That’s it for a quick test. But there are important behavioral differences in turn detection, partials, and formatting that may require updates to your message handling logic. Read on for the full migration details.

Why upgrade

Universal-3 Pro Streaming delivers:

Exceptional entity accuracy — credit card numbers, phone numbers, email addresses, physical addresses, and names captured correctly at streaming speed
Promptable model — custom transcription instructions via prompt, plus domain-term boosting via keyterms_prompt (up to 100 terms)
Better turn detection — punctuation-based system that waits when speakers pause mid-thought and responds when they’re done
Native multilingual code-switching — English, Spanish, German, French, Portuguese, Italian in a single model
Sub-300ms latency — fast time to complete transcript
Mid-stream configuration — update keyterms, prompts, and silence parameters without dropping the connection

For full details, see Universal-3 Pro Streaming.

What changes

This table covers the key parameter, behavior, and response field differences. Use it as a migration checklist.

What	Universal Streaming	Universal-3 Pro Streaming	Action Required
`speech_model`	Not required (defaults to English)	`"u3-rt-pro"`	Add `speech_model: "u3-rt-pro"` to connection params
`format_turns`	`false` by default; set `true` for formatted transcripts	Always on (not a parameter)	Remove `format_turns` from connection params
Turn detection	Confidence-based (`end_of_turn_confidence_threshold`, default `0.4` — officially deprecated)	Punctuation-based (`min_turn_silence` + terminal punctuation)	Remove `end_of_turn_confidence_threshold` (deprecated); tune `min_turn_silence` / `max_turn_silence` instead
`min_turn_silence`	`400` ms (minimum silence before checking confidence)	`100` ms (silence before speculative EOT check)	Review and adjust if you tuned this value
`max_turn_silence`	`1280` ms	`1000` ms	Review and adjust if you tuned this value
`end_of_turn` / `turn_is_formatted`	Can differ on English model — `turn_is_formatted` arrives as a separate message after `end_of_turn` (multilingual model has formatting built in, so they match)	Always the same value — one end-of-turn transcript per turn, always formatted	Simplify: just check `end_of_turn: true` for the final formatted transcript
Partials	Emitted frequently during speech (unformatted on English model, formatted on multilingual model)	Emitted only during silence periods (at most one partial per silence period)	Expect fewer but more complete partials
`prompt`	Not supported	Supported — custom transcription instructions	New capability (optional)
`keyterms_prompt`	Supported (connection-time only; not updatable mid-stream)	Supported; can be used together with `prompt`; updatable mid-stream	No change needed; new: can combine with `prompt` and update via `UpdateConfiguration`
`UpdateConfiguration`	Turn detection params only (`end_of_turn_confidence_threshold`, `min_turn_silence`, `max_turn_silence`)	`prompt`, `keyterms_prompt`, `min_turn_silence`, `max_turn_silence`	Update any mid-stream config logic to use new fields
`ForceEndpoint`	Supported	Supported	No change needed
`language`	`"en"` or `"multi"` (officially deprecated)	Not a parameter (native code-switching)	Remove `language` param; use `prompt` to guide language if needed
`vad_threshold`	`0.4` (default)	`0.3` (default)	Review and adjust if you tuned this value — lower default means higher noise sensitivity
`language_detection`	Supported (`true`/`false`, default `false`) with multilingual model	Supported — automatic with code-switching	Remove if set; U3 Pro detects language automatically
Languages	English default; multilingual requires `speech_model: "universal-streaming-multilingual"`	Native multilingual code switching (6 languages) in a single model	Remove multilingual model switching; optionally prepend language to prompt

Sources: U3 Pro docs, Universal docs, Turn detection docs, API Reference

Side-by-side code

Full working Python examples side by side using raw websocket-client.

Universal Streaming

Universal-3 Pro Streaming

1 import pyaudio
2 import websocket
3 import json
4 import threading
5 import time
6 from urllib.parse import urlencode
7 
8 YOUR_API_KEY = "<YOUR_API_KEY>"
9 
10 CONNECTION_PARAMS = {
11     "sample_rate": 16000,
12     "format_turns": True,
13 }
14 API_ENDPOINT_BASE_URL = "wss://streaming.assemblyai.com/v3/ws"
15 API_ENDPOINT = f"{API_ENDPOINT_BASE_URL}?{urlencode(CONNECTION_PARAMS)}"
16 
17 FRAMES_PER_BUFFER = 800
18 SAMPLE_RATE = CONNECTION_PARAMS["sample_rate"]
19 CHANNELS = 1
20 FORMAT = pyaudio.paInt16
21 
22 audio = None
23 stream = None
24 ws_app = None
25 audio_thread = None
26 stop_event = threading.Event()
27 
28 def on_open(ws):
29     print("WebSocket connection opened.")
30     def stream_audio():
31         global stream
32         while not stop_event.is_set():
33             try:
34                 audio_data = stream.read(FRAMES_PER_BUFFER, exception_on_overflow=False)
35                 ws.send(audio_data, websocket.ABNF.OPCODE_BINARY)
36             except Exception as e:
37                 print(f"Error streaming audio: {e}")
38                 break
39 
40     global audio_thread
41     audio_thread = threading.Thread(target=stream_audio)
42     audio_thread.daemon = True
43     audio_thread.start()
44 
45 def on_message(ws, message):
46     try:
47         data = json.loads(message)
48         msg_type = data.get("type")
49 
50         if msg_type == "Begin":
51             print(f"Session began: ID={data.get('id')}")
52         elif msg_type == "Turn":
53             transcript = data.get("transcript", "")
54             turn_is_formatted = data.get("turn_is_formatted", False)
55             if turn_is_formatted:
56                 print(f"\r{' ' * 80}\r{transcript}")
57             else:
58                 print(f"\r{transcript}", end="")
59         elif msg_type == "Termination":
60             print(f"\nSession terminated: {data.get('audio_duration_seconds', 0)}s of audio")
61     except Exception as e:
62         print(f"Error handling message: {e}")
63 
64 def on_error(ws, error):
65     print(f"\nWebSocket Error: {error}")
66     stop_event.set()
67 
68 def on_close(ws, close_status_code, close_msg):
69     print(f"\nWebSocket Disconnected: Status={close_status_code}")
70     global stream, audio
71     stop_event.set()
72     if stream:
73         if stream.is_active():
74             stream.stop_stream()
75         stream.close()
76     if audio:
77         audio.terminate()
78 
79 def run():
80     global audio, stream, ws_app
81 
82     audio = pyaudio.PyAudio()
83     stream = audio.open(
84         input=True,
85         frames_per_buffer=FRAMES_PER_BUFFER,
86         channels=CHANNELS,
87         format=FORMAT,
88         rate=SAMPLE_RATE,
89     )
90     print("Speak into your microphone. Press Ctrl+C to stop.")
91 
92     ws_app = websocket.WebSocketApp(
93         API_ENDPOINT,
94         header={"Authorization": YOUR_API_KEY},
95         on_open=on_open,
96         on_message=on_message,
97         on_error=on_error,
98         on_close=on_close,
99     )
100 
101     ws_thread = threading.Thread(target=ws_app.run_forever)
102     ws_thread.daemon = True
103     ws_thread.start()
104 
105     try:
106         while ws_thread.is_alive():
107             time.sleep(0.1)
108     except KeyboardInterrupt:
109         print("\nStopping...")
110         stop_event.set()
111         if ws_app and ws_app.sock and ws_app.sock.connected:
112             ws_app.send(json.dumps({"type": "Terminate"}))
113             time.sleep(2)
114         if ws_app:
115             ws_app.close()
116         ws_thread.join(timeout=2.0)
117 
118 if __name__ == "__main__":
119     run()

Turn detection

This is the most significant behavioral difference between the two models.

Universal Streaming uses a confidence-based system combining semantic and acoustic detection (source):

Parameter	Default	Description
`end_of_turn_confidence_threshold`	`0.4`	Confidence threshold (0.0-1.0) to trigger end of turn (officially deprecated)
`min_turn_silence`	`400` ms	Minimum silence before checking confidence
`max_turn_silence`	`1280` ms	Maximum silence before forcing end of turn

The model evaluates end_of_turn_confidence during silence. If the score exceeds end_of_turn_confidence_threshold after min_turn_silence, the turn ends. Otherwise, the turn is forced to end after max_turn_silence.

Universal-3 Pro uses a punctuation-based system (source):

Parameter	Default	Description
`min_turn_silence`	`100` ms	Silence before a speculative end-of-turn check fires
`max_turn_silence`	`1000` ms	Maximum silence before a turn is forced to end

When silence reaches min_turn_silence, the model transcribes the audio and checks for terminal punctuation (. ? !):

Terminal punctuation found — the turn ends (end_of_turn: true)
No terminal punctuation — a partial is emitted (end_of_turn: false) and the turn continues
Silence reaches max_turn_silence — the turn is forced to end regardless of punctuation

end_of_turn_confidence_threshold does not exist on Universal-3 Pro (it was never part of the U3 Pro API — not deprecated, just absent). It is officially deprecated on Universal Streaming. Remove this parameter and configure min_turn_silence and max_turn_silence instead. For configuration guidance, see Configuring Turn Detection.

New capabilities

These features are new or enhanced in Universal-3 Pro. For full details, see Universal-3 Pro Streaming.

Prompting

Universal-3 Pro supports a prompt parameter for custom transcription instructions. When omitted, a default prompt optimized for turn detection (88% accuracy) is applied automatically. See the Prompting Guide for details.

1 CONNECTION_PARAMS = {
2     "sample_rate": 16000,
3     "speech_model": "u3-rt-pro",
4     "prompt": "Transcribe verbatim. Rules:\n1) Always include punctuation in output.\n2) Use period/question mark ONLY for complete sentences.\n3) Use comma for mid-sentence pauses.\n4) Use no punctuation for incomplete trailing speech.\n5) Filler words (um, uh, so, like) indicate speaker will continue.",
5 }

Start with no prompt. The default prompt delivers 88% turn detection accuracy. Only customize if you have specific requirements, and build off the default prompt rather than starting from scratch.

Keyterms prompting

Boost recognition of specific names, brands, or domain terms. Maximum 100 keyterms, each 50 characters or less. See Keyterms Prompting for details.

1 import json
2 
3 CONNECTION_PARAMS = {
4     "sample_rate": 16000,
5     "speech_model": "u3-rt-pro",
6     "keyterms_prompt": json.dumps(["Keanu Reeves", "AssemblyAI", "Universal-3"]),
7 }

prompt and keyterms_prompt can be used together. When you use keyterms_prompt, your boosted words are appended to the default prompt (or your custom prompt if provided) automatically.

Mid-stream configuration updates

Update prompt, keyterms_prompt, min_turn_silence, and max_turn_silence during an active session without reconnecting. See Updating configuration mid-stream for details.

1 ws.send(json.dumps({
2     "type": "UpdateConfiguration",
3     "keyterms_prompt": ["cardiology", "echocardiogram", "Dr. Patel"],
4     "max_turn_silence": 5000
5 }))

Force turn end

ForceEndpoint is supported on both Universal Streaming and Universal-3 Pro — no migration changes needed. Force the current turn to end immediately based on external signals. See Forcing a turn endpoint for details.

1 ws.send(json.dumps({"type": "ForceEndpoint"}))

Language support

Universal Streaming transcribes English by default. For multilingual support, use speech_model: "universal-streaming-multilingual". (Source)

Universal-3 Pro natively code-switches between 6 languages in a single model — no separate multilingual model needed: English, Spanish, German, French, Portuguese, Italian. It also supports automatic language detection, returning language_code and language_confidence fields in Turn messages. To guide toward a specific language, prepend Transcribe <language>. to the default prompt. See Supported languages for the full list.

Language Detection: Universal Streaming supports the language_detection connection parameter (true/false, default false) with the multilingual model. When enabled, Turn messages include language_code and language_confidence fields. Universal-3 Pro also supports language detection with code-switching — see Supported languages for details.

Need more than 6 languages? Use the Whisper Streaming model (speech_model: "whisper-rt") for 99+ languages with automatic language detection. See Whisper Streaming for details.