Streaming Migration Guide: Universal Streaming to Universal-3 Pro Streaming

This guide walks through the process of upgrading from Universal Streaming to Universal-3 Pro Streaming for real-time audio transcription.

Get Started

Before we begin, make sure you have an AssemblyAI account and an API key. You can sign up for a free account and get your API key from your dashboard.

Quick upgrade

If you’re already using Universal Streaming, you can quickly test Universal-3 Pro Streaming by switching the speech_model parameter to "u3-rt-pro" and removing format_turns (formatting is always on in U3 Pro). Just update the connection params and start streaming.

1# Before (Universal Streaming)
2CONNECTION_PARAMS = {
3 "sample_rate": 16000,
4 "format_turns": True,
5}
6
7# After (Universal-3 Pro Streaming)
8CONNECTION_PARAMS = {
9 "sample_rate": 16000,
10 "speech_model": "u3-rt-pro",
11}

That’s it for a quick test. But there are important behavioral differences in turn detection, partials, and formatting that may require updates to your message handling logic. Read on for the full migration details.

Why upgrade

Universal-3 Pro Streaming delivers:

  • Exceptional entity accuracy — credit card numbers, phone numbers, email addresses, physical addresses, and names captured correctly at streaming speed
  • Promptable model — custom transcription instructions via prompt, plus domain-term boosting via keyterms_prompt (up to 100 terms)
  • Better turn detection — punctuation-based system that waits when speakers pause mid-thought and responds when they’re done
  • Native multilingual code-switching — English, Spanish, German, French, Portuguese, Italian in a single model
  • Sub-300ms latency — fast time to complete transcript
  • Mid-stream configuration — update keyterms, prompts, and silence parameters without dropping the connection

For full details, see Universal-3 Pro Streaming.

What changes

This table covers the key parameter, behavior, and response field differences. Use it as a migration checklist.

WhatUniversal StreamingUniversal-3 Pro StreamingAction Required
speech_modelNot required (defaults to English)"u3-rt-pro"Add speech_model: "u3-rt-pro" to connection params
format_turnsfalse by default; set true for formatted transcriptsAlways on (not a parameter)Remove format_turns from connection params
Turn detectionConfidence-based (end_of_turn_confidence_threshold, default 0.4officially deprecated)Punctuation-based (min_turn_silence + terminal punctuation)Remove end_of_turn_confidence_threshold (deprecated); tune min_turn_silence / max_turn_silence instead
min_turn_silence400 ms (minimum silence before checking confidence)100 ms (silence before speculative EOT check)Review and adjust if you tuned this value
max_turn_silence1280 ms1000 msReview and adjust if you tuned this value
end_of_turn / turn_is_formattedCan differ on English model — turn_is_formatted arrives as a separate message after end_of_turn (multilingual model has formatting built in, so they match)Always the same value — one end-of-turn transcript per turn, always formattedSimplify: just check end_of_turn: true for the final formatted transcript
PartialsEmitted frequently during speech (unformatted on English model, formatted on multilingual model)Emitted only during silence periods (at most one partial per silence period)Expect fewer but more complete partials
promptNot supportedSupported — custom transcription instructionsNew capability (optional)
keyterms_promptSupported (connection-time only; not updatable mid-stream)Supported; can be used together with prompt; updatable mid-streamNo change needed; new: can combine with prompt and update via UpdateConfiguration
UpdateConfigurationTurn detection params only (end_of_turn_confidence_threshold, min_turn_silence, max_turn_silence)prompt, keyterms_prompt, min_turn_silence, max_turn_silenceUpdate any mid-stream config logic to use new fields
ForceEndpointSupportedSupportedNo change needed
language"en" or "multi" (officially deprecated)Not a parameter (native code-switching)Remove language param; use prompt to guide language if needed
vad_threshold0.4 (default)0.3 (default)Review and adjust if you tuned this value — lower default means higher noise sensitivity
language_detectionSupported (true/false, default false) with multilingual modelSupported — automatic with code-switchingRemove if set; U3 Pro detects language automatically
LanguagesEnglish default; multilingual requires speech_model: "universal-streaming-multilingual"Native multilingual code switching (6 languages) in a single modelRemove multilingual model switching; optionally prepend language to prompt

Sources: U3 Pro docs, Universal docs, Turn detection docs, API Reference

Side-by-side code

Full working Python examples side by side using raw websocket-client.

1import pyaudio
2import websocket
3import json
4import threading
5import time
6from urllib.parse import urlencode
7
8YOUR_API_KEY = "<YOUR_API_KEY>"
9
10CONNECTION_PARAMS = {
11 "sample_rate": 16000,
12 "format_turns": True,
13}
14API_ENDPOINT_BASE_URL = "wss://streaming.assemblyai.com/v3/ws"
15API_ENDPOINT = f"{API_ENDPOINT_BASE_URL}?{urlencode(CONNECTION_PARAMS)}"
16
17FRAMES_PER_BUFFER = 800
18SAMPLE_RATE = CONNECTION_PARAMS["sample_rate"]
19CHANNELS = 1
20FORMAT = pyaudio.paInt16
21
22audio = None
23stream = None
24ws_app = None
25audio_thread = None
26stop_event = threading.Event()
27
28def on_open(ws):
29 print("WebSocket connection opened.")
30 def stream_audio():
31 global stream
32 while not stop_event.is_set():
33 try:
34 audio_data = stream.read(FRAMES_PER_BUFFER, exception_on_overflow=False)
35 ws.send(audio_data, websocket.ABNF.OPCODE_BINARY)
36 except Exception as e:
37 print(f"Error streaming audio: {e}")
38 break
39
40 global audio_thread
41 audio_thread = threading.Thread(target=stream_audio)
42 audio_thread.daemon = True
43 audio_thread.start()
44
45def on_message(ws, message):
46 try:
47 data = json.loads(message)
48 msg_type = data.get("type")
49
50 if msg_type == "Begin":
51 print(f"Session began: ID={data.get('id')}")
52 elif msg_type == "Turn":
53 transcript = data.get("transcript", "")
54 turn_is_formatted = data.get("turn_is_formatted", False)
55 if turn_is_formatted:
56 print(f"\r{' ' * 80}\r{transcript}")
57 else:
58 print(f"\r{transcript}", end="")
59 elif msg_type == "Termination":
60 print(f"\nSession terminated: {data.get('audio_duration_seconds', 0)}s of audio")
61 except Exception as e:
62 print(f"Error handling message: {e}")
63
64def on_error(ws, error):
65 print(f"\nWebSocket Error: {error}")
66 stop_event.set()
67
68def on_close(ws, close_status_code, close_msg):
69 print(f"\nWebSocket Disconnected: Status={close_status_code}")
70 global stream, audio
71 stop_event.set()
72 if stream:
73 if stream.is_active():
74 stream.stop_stream()
75 stream.close()
76 if audio:
77 audio.terminate()
78
79def run():
80 global audio, stream, ws_app
81
82 audio = pyaudio.PyAudio()
83 stream = audio.open(
84 input=True,
85 frames_per_buffer=FRAMES_PER_BUFFER,
86 channels=CHANNELS,
87 format=FORMAT,
88 rate=SAMPLE_RATE,
89 )
90 print("Speak into your microphone. Press Ctrl+C to stop.")
91
92 ws_app = websocket.WebSocketApp(
93 API_ENDPOINT,
94 header={"Authorization": YOUR_API_KEY},
95 on_open=on_open,
96 on_message=on_message,
97 on_error=on_error,
98 on_close=on_close,
99 )
100
101 ws_thread = threading.Thread(target=ws_app.run_forever)
102 ws_thread.daemon = True
103 ws_thread.start()
104
105 try:
106 while ws_thread.is_alive():
107 time.sleep(0.1)
108 except KeyboardInterrupt:
109 print("\nStopping...")
110 stop_event.set()
111 if ws_app and ws_app.sock and ws_app.sock.connected:
112 ws_app.send(json.dumps({"type": "Terminate"}))
113 time.sleep(2)
114 if ws_app:
115 ws_app.close()
116 ws_thread.join(timeout=2.0)
117
118if __name__ == "__main__":
119 run()

Turn detection

This is the most significant behavioral difference between the two models.

Universal Streaming uses a confidence-based system combining semantic and acoustic detection (source):

ParameterDefaultDescription
end_of_turn_confidence_threshold0.4Confidence threshold (0.0-1.0) to trigger end of turn (officially deprecated)
min_turn_silence400 msMinimum silence before checking confidence
max_turn_silence1280 msMaximum silence before forcing end of turn

The model evaluates end_of_turn_confidence during silence. If the score exceeds end_of_turn_confidence_threshold after min_turn_silence, the turn ends. Otherwise, the turn is forced to end after max_turn_silence.

Universal-3 Pro uses a punctuation-based system (source):

ParameterDefaultDescription
min_turn_silence100 msSilence before a speculative end-of-turn check fires
max_turn_silence1000 msMaximum silence before a turn is forced to end

When silence reaches min_turn_silence, the model transcribes the audio and checks for terminal punctuation (. ? !):

  • Terminal punctuation found — the turn ends (end_of_turn: true)
  • No terminal punctuation — a partial is emitted (end_of_turn: false) and the turn continues
  • Silence reaches max_turn_silence — the turn is forced to end regardless of punctuation

end_of_turn_confidence_threshold does not exist on Universal-3 Pro (it was never part of the U3 Pro API — not deprecated, just absent). It is officially deprecated on Universal Streaming. Remove this parameter and configure min_turn_silence and max_turn_silence instead. For configuration guidance, see Configuring Turn Detection.

New capabilities

These features are new or enhanced in Universal-3 Pro. For full details, see Universal-3 Pro Streaming.

Prompting

Universal-3 Pro supports a prompt parameter for custom transcription instructions. When omitted, a default prompt optimized for turn detection (88% accuracy) is applied automatically. See the Prompting Guide for details.

1CONNECTION_PARAMS = {
2 "sample_rate": 16000,
3 "speech_model": "u3-rt-pro",
4 "prompt": "Transcribe verbatim. Rules:\n1) Always include punctuation in output.\n2) Use period/question mark ONLY for complete sentences.\n3) Use comma for mid-sentence pauses.\n4) Use no punctuation for incomplete trailing speech.\n5) Filler words (um, uh, so, like) indicate speaker will continue.",
5}

Start with no prompt. The default prompt delivers 88% turn detection accuracy. Only customize if you have specific requirements, and build off the default prompt rather than starting from scratch.

Keyterms prompting

Boost recognition of specific names, brands, or domain terms. Maximum 100 keyterms, each 50 characters or less. See Keyterms Prompting for details.

1import json
2
3CONNECTION_PARAMS = {
4 "sample_rate": 16000,
5 "speech_model": "u3-rt-pro",
6 "keyterms_prompt": json.dumps(["Keanu Reeves", "AssemblyAI", "Universal-3"]),
7}

prompt and keyterms_prompt can be used together. When you use keyterms_prompt, your boosted words are appended to the default prompt (or your custom prompt if provided) automatically.

Mid-stream configuration updates

Update prompt, keyterms_prompt, min_turn_silence, and max_turn_silence during an active session without reconnecting. See Updating configuration mid-stream for details.

1ws.send(json.dumps({
2 "type": "UpdateConfiguration",
3 "keyterms_prompt": ["cardiology", "echocardiogram", "Dr. Patel"],
4 "max_turn_silence": 5000
5}))

Force turn end

ForceEndpoint is supported on both Universal Streaming and Universal-3 Pro — no migration changes needed. Force the current turn to end immediately based on external signals. See Forcing a turn endpoint for details.

1ws.send(json.dumps({"type": "ForceEndpoint"}))

Language support

Universal Streaming transcribes English by default. For multilingual support, use speech_model: "universal-streaming-multilingual". (Source)

Universal-3 Pro natively code-switches between 6 languages in a single model — no separate multilingual model needed: English, Spanish, German, French, Portuguese, Italian. It also supports automatic language detection, returning language_code and language_confidence fields in Turn messages. To guide toward a specific language, prepend Transcribe <language>. to the default prompt. See Supported languages for the full list.

Language Detection: Universal Streaming supports the language_detection connection parameter (true/false, default false) with the multilingual model. When enabled, Turn messages include language_code and language_confidence fields. Universal-3 Pro also supports language detection with code-switching — see Supported languages for details.

Need more than 6 languages? Use the Whisper Streaming model (speech_model: "whisper-rt") for 99+ languages with automatic language detection. See Whisper Streaming for details.

Resources