Mode
Universal-3.5 Pro Streaming workloads sit on a spectrum between two competing goals: returning transcripts as fast as possible, and returning the most accurate transcripts possible. To make this tradeoff explicit, Universal-3.5 Pro supports a mode connection parameter you can set when opening a streaming session.| Mode | Value | When to use |
|---|---|---|
| Min latency | min_latency | Lowest possible time-to-text. Best when responsiveness matters more than catching every word. |
| Balanced (default) | balanced | A middle ground between latency and accuracy. Best for voice agents and other interactive applications. |
| Max accuracy | max_accuracy | Highest transcription accuracy. Best for note-taking, scribes, and post-call analysis where a small added delay is acceptable. |
mode connection parameter when you open the WebSocket.
- Python
- Python SDK
- Javascript
- JavaScript SDK
Language Selection
By default, Universal-3.5 Pro Streaming runs in multilingual mode. Pass alanguage_code connection parameter to bias the model toward a single language. This is useful when you know the session is monolingual and want to improve language accuracy.
| Model | Languages |
|---|---|
| Universal-3 Pro Streaming | en, es, fr, de, it, pt |
| Universal-3-5 Pro Streaming | en, es, fr, de, it, pt, tr, nl, sv, no, da, fi, hi, vi, ar, he, ja, zh |
language_code connection parameter when you open the WebSocket. Omit language_code to keep multilingual code-switching behavior.
- Python
- Python SDK
- Javascript
- JavaScript SDK
Advanced: Tuning turn detection parameters
Beyond themode parameter, you can tune individual turn detection parameters to fine-tune partial cadence and turn endpointing for your use case. The parameters differ by model.
Universal-3.5 Pro Streaming
Universal-3.5 Pro Streaming
Universal-3.5 Pro Streaming uses punctuation-based turn detection. Turns end when terminal punctuation (
Tuning recipe — long utterance prepWhen your voice agent prompts the user for a long utterance (credit card, phone number, address), raise After the response, restore the default:See Updating configuration mid-stream for the full list of mid-stream parameters.
. ? !) is detected; if no punctuation is detected within max_turn_silence, the turn ends anyway.Each mode ships with its own set of defaults for these parameters. Override any of them on the connection to fine-tune further.| Parameter | Default | Description |
|---|---|---|
min_turn_silence | min_latency: 96balanced: 224max_accuracy: 800 | Silence (ms) before a speculative end-of-turn check fires. Lower = faster turn endings; higher = fewer entity splits on numbers and proper nouns. |
max_turn_silence | min_latency: 416balanced: 1536max_accuracy: 1536 | Maximum silence (ms) before forcing a turn to end, regardless of punctuation. Raise it when you expect a longer pause (caller reading a credit card, address). |
interruption_delay | min_latency: 0balanced: 500max_accuracy: 500 | Time to first partial (ms). Lower = faster TTFT for barge-in detection; higher = more confident first partials. The server adds ~300ms minimum on top. |
continuous_partials | min_latency: truebalanced: truemax_accuracy: true | When true, emit a partial every ~3s during continuous speech. Useful for long utterances where silence-based partials don’t fire often enough. |
vad_threshold | min_latency: 0.3balanced: 0.2max_accuracy: 0.2 | Confidence threshold (0–1) for classifying audio frames as speech. Increase for noisy environments to reduce false speech detection. |
min_turn_silence mid-stream so brief pauses don’t fragment the turn:Universal Streaming
Universal Streaming
Universal Streaming uses confidence-based turn detection. The model predicts when speech naturally ends; if confidence exceeds
Quick-start configurationsAggressive — short, rapid back-and-forth (e.g., IVR replacements, order confirmations):Balanced — most conversational voice agents (e.g., customer support):Conservative — reflective or complex speech (e.g., healthcare, sales, legal):Disabling turn detectionIf you’re using your own VAD or turn detection model, send a Or set
end_of_turn_confidence_threshold and min_turn_silence has passed, the turn ends. Acoustic (silence-based) detection kicks in as a fallback after max_turn_silence.| Parameter | Default | Description |
|---|---|---|
end_of_turn_confidence_threshold | 0.4 | Confidence threshold for semantic end-of-turn. Higher = more confident before ending; lower = ends faster. |
min_turn_silence | 400 ms | Silence required before a semantic end-of-turn fires. |
max_turn_silence | 1280 ms | Maximum silence before forcing a turn to end via acoustic detection. |
vad_threshold | — | Confidence threshold (0–1) for classifying audio frames as speech. Increase for noisy environments to reduce false speech detection. |
ForceEndpoint event to force a turn boundary:end_of_turn_confidence_threshold to 1 (acoustic-only fallback) or 0 (silence-only). Setting it to 0 is not recommended unless you have a custom turn detection model running on top — it forces a turn at every min_turn_silence-length pause and fragments mid-sentence thinking pauses.