Prompting Guide (Streaming)
Prompting Guide (Streaming)
Learn how to use prompts to customize Universal-3 Pro Streaming transcription.
Prompting Guide (Streaming)
Learn how to use prompts to customize Universal-3 Pro Streaming transcription.
Universal-3 Pro Streaming supports a prompt parameter that lets you customize transcription output for streaming use cases. You can guide the model’s behavior for punctuation, disfluencies, formatting, and domain-specific terminology.
We strongly recommend testing with no prompt first. When you omit the prompt parameter, Universal-3 Pro automatically applies a built-in default prompt optimized for turn detection and streaming accuracy — delivering 88% turn detection accuracy out of the box.
Universal-3 Pro also supports keyterms_prompt for boosting specific terms. We recommend using keyterms_prompt to boost domain-specific terminology, and you can use it together with prompt in the same request. See Keyterms prompting below.
If you’re going to build a prompt, start with the default prompt and then tweak it for your use case. You should not start from scratch with your prompt — use the default prompt and then build off of it.
Remember, prompts are primarily instructional, so adding a large amount of context may not make a significant impact on accuracy and could reduce instruction-following coherence. Feel free to layer in additional instructions from this guide.
We recommend testing with no prompt first. When you omit the prompt parameter, Universal-3 Pro automatically applies a built-in default prompt that is already optimized for turn detection and streaming accuracy.
The following is the current built-in system prompt used by Universal-3 Pro when no prompt parameter is provided:
This default prompt delivers 88% turn detection accuracy out of the box. If you need to customize, build off of this prompt rather than starting from scratch. Append your additional instructions after the default prompt to preserve the turn detection behavior while adding your own requirements.
Turn detection relies on terminal punctuation (. ? !), so custom prompts
that reduce or remove punctuation from the transcription output may negatively
impact turn detection. The default prompt is optimized for this, so we
recommend trying it first before any customization.
The previous built-in system prompt was:
The previous built-in system prompt was:
Universal-3 Pro prompting is more instructional than contextual. The model responds best to explicit formatting rules and behavioral instructions — for example, “use periods only for complete sentences” or “include all filler words.” This applies to both streaming and pre-recorded (async) use cases.
Providing topic context alone (e.g., “this is a cardiology appointment”) can help, but is most effective when paired with specific instructions telling the model how to transcribe. For domain-specific term boosting, use the keyterms_prompt parameter to explicitly list the terms you want the model to recognize.
We are actively working to make Universal-3 Pro more contextual:
These improvements are under active development. In the meantime, use keyterms_prompt for domain-specific term boosting and instructional prompts for formatting control.
Universal-3 Pro Streaming does not support the language_code connection
parameter — it is silently ignored. The language_detection parameter only
controls whether language metadata (such as language_code and
language_confidence) is returned on Turn events; it does not affect which
language the model transcribes. To specify the transcription language, use the
prompt parameter as described below.
Providing language information ahead of time in the prompt helps the model with transcription tasks. For example, if the model is told to transcribe Spanish, audio could be transcribed “si”, but if told English, it could be transcribed “C”.
Although prompting is a beta feature, we’ve found good results when you build off of the default prompt — which is exactly what we do here for adding language information by prepending Transcribe <language>. to the default prompt.
Our team is running evaluations to determine the best method for attaching this context to the prompt, and we will update this section with the best methods. So far, we have seen that prepending language information with Transcribe <language>. to the default prompt improves the output:
If you have multiple languages, append all languages like Transcribe multilingual conversation in English, Spanish, and German.
You can update prompting parameters during an active streaming session using UpdateConfiguration. The recommended approach is to dynamically update keyterms_prompt based on the current stage of your voice agent flow. If you know what answers or terminology to expect at a given point in the conversation, add those terms with keyterms_prompt so the model is primed to recognize them accurately.
For example, if your voice agent is currently asking for the caller’s name and date of birth, send the expected terms for that stage:
Then, when the conversation moves to a medical intake stage, update keyterms_prompt with the relevant domain terms:
Dynamically update keyterms_prompt for each stage of your voice agent
flow. If you expect certain answers at a specific stage — names, addresses,
account numbers, medical terms — proactively add those as keyterms so the
model recognizes them accurately when the caller speaks them. This is the most
effective way to improve recognition accuracy mid-stream.
See Keyterms prompting for more details on how keyterms_prompt works.
You can also update the prompt parameter mid-stream to adjust formatting or behavioral instructions:
Updating prompt mid-stream is useful for passing updated behavioral instructions into the STT stream. For domain-specific term recognition, prefer keyterms_prompt as described above.
Use the keyterms_prompt parameter to boost recognition of specific names, brands, or domain terms. Behind the scenes, keyterms_prompt relies on the default prompt and appends your boosted words to it. Pass an array of terms you want the model to prioritize:
You can set keyterms_prompt at connection time or update it mid-stream as the conversation progresses. For full details, see Keyterms prompting.
You can use prompt and keyterms_prompt together in the same streaming
request. When you use keyterms_prompt, your boosted words are appended to
the default prompt (or your custom prompt
if provided) automatically.
Many of the prompt capabilities available for async (pre-recorded) transcription are designed for longer audio files and do not reliably work with streaming’s shorter audio segments (typically under 10 seconds). For streaming, focus your prompts on punctuation rules, verbatim transcription, and formatting instructions. Use keyterms_prompt for domain-specific term boosting.
Prompting is not an alternative method for speaker separation. To identify
individual speakers in a streaming session, enable Streaming
Diarization by setting
speaker_labels: true in your connection parameters.
Prompting behavior differs between streaming and async (pre-recorded) use cases. This guide covers prompting for streaming (real-time audio). If you’re working with pre-recorded audio, see the Prompting Guide (Async).
Because Universal-3 Pro’s turn detection is punctuation-based, the way the model punctuates its output directly affects when turns end. Prompting gives you a lever to influence that punctuation behavior and improve turn detection accuracy.
Internal testing on 100 audio samples shows that well-crafted punctuation rules significantly reduce false positives (the model incorrectly signaling a turn end) while maintaining near-perfect recall (catching real turn ends):
The default prompt—used when no prompt is provided—dropped the false positive rate from 34% to 24% compared to no prompt at all. That translates to fewer false interruptions in production voice agent conversations.
Key takeaways for prompt design: