Prompting Guide (Streaming) | AssemblyAI

Universal-3 Pro Streaming supports a prompt parameter that lets you customize transcription output for streaming use cases. You can guide the model’s behavior for punctuation, disfluencies, formatting, and domain-specific terminology.

Start with no prompt

We strongly recommend testing with no prompt first. When you omit the prompt parameter, Universal-3 Pro automatically applies a built-in default prompt optimized for turn detection and streaming accuracy — delivering 88% turn detection accuracy out of the box.

Universal-3 Pro also supports keyterms_prompt for boosting specific terms. We recommend using keyterms_prompt to boost domain-specific terminology, and you can use it together with prompt in the same request. See Keyterms prompting below.

If you’re going to build a prompt, start with the default prompt and then tweak it for your use case. You should not start from scratch with your prompt — use the default prompt and then build off of it.

Remember, prompts are primarily instructional, so adding a large amount of context may not make a significant impact on accuracy and could reduce instruction-following coherence. Feel free to layer in additional instructions from this guide.

Start with the default prompt

We recommend testing with no prompt first. When you omit the prompt parameter, Universal-3 Pro automatically applies a built-in default prompt that is already optimized for turn detection and streaming accuracy.

Current system prompt

The following is the current built-in system prompt used by Universal-3 Pro when no prompt parameter is provided:

Transcribe verbatim. Rules:
Always include punctuation in output.
Use period/question mark ONLY for complete sentences.
Use comma for mid-sentence pauses.
Use no punctuation for incomplete trailing speech.
Include spoken filler words, hesitations, repetitions and false starts when clearly spoken.

This default prompt delivers 88% turn detection accuracy out of the box. If you need to customize, build off of this prompt rather than starting from scratch. Append your additional instructions after the default rules to preserve the turn detection behavior while adding your own requirements.

Turn detection relies on terminal punctuation (. ? !), so custom prompts that reduce or remove punctuation from the transcription output may negatively impact turn detection. The default prompt is optimized for this, so we recommend trying it first before any customization.

Prior system prompt (March 3, 2026 – March 12, 2026)

The previous built-in system prompt was:

Transcribe verbatim. Rules:
Always include punctuation in output.
Use period/question mark ONLY for complete sentences.
Use comma for mid-sentence pauses.
Use no punctuation for incomplete trailing speech.
Filler words (um, uh, so, like) indicate speaker will continue.

The key change in the current prompt is the last rule: filler words, hesitations, repetitions, and false starts are now explicitly included in the transcription output when clearly spoken, rather than being treated solely as continuation signals.

How prompting works

Universal-3 Pro prompting is more instructional than contextual. The model responds best to explicit formatting rules and behavioral instructions — for example, “use periods only for complete sentences” or “include all filler words.” This applies to both streaming and pre-recorded (async) use cases.

Providing topic context alone (e.g., “this is a cardiology appointment”) can help, but is most effective when paired with specific instructions telling the model how to transcribe. For domain-specific term boosting, use the keyterms_prompt parameter to explicitly list the terms you want the model to recognize.

Coming soon: enhanced contextual prompting

We are actively working to make Universal-3 Pro more contextual:

Contextual prompting — Providing topic context (e.g., “this is a cardiology appointment”) will automatically boost recognition of related terminology, improving accuracy for domain-specific use cases across both streaming and async.
Conversational context for streaming — For voice agent workflows, you will be able to pass previous utterances from the conversation (e.g., the agent’s last response) as context, allowing the model to use the conversation history to improve transcription accuracy of the user’s next utterance.

These improvements are under active development. In the meantime, use keyterms_prompt for domain-specific term boosting and instructional prompts for formatting control.

Prompting language

Universal-3 Pro Streaming does not support the language_code connection parameter — it is silently ignored. The language_detection parameter only controls whether language metadata (such as language_code and language_confidence) is returned on Turn events; it does not affect which language the model transcribes. To specify the transcription language, use the prompt parameter as described below.

Providing language information ahead of time in the prompt helps the model with transcription tasks. For example, if the model is told to transcribe Spanish, audio could be transcribed “si”, but if told English, it could be transcribed “C”.

Although prompting is a beta feature, we’ve found good results when you build off of the default prompt — which is exactly what we do here for adding language information by prepending Transcribe <language>. to the default prompt.

Our team is running evaluations to determine the best method for attaching this context to the prompt, and we will update this section with the best methods. So far, we have seen that prepending language information with Transcribe <language>. to the default prompt improves the output:

Transcribe Spanish. Transcribe verbatim. Rules:
    Always include punctuation in output.
    Use period/question mark ONLY for complete sentences.
    Use comma for mid-sentence pauses.
    Use no punctuation for incomplete trailing speech.
    Include spoken filler words, hesitations, repetitions and false starts when clearly spoken.

If you have multiple languages, append all languages like Transcribe multilingual conversation in English, Spanish, and German.

Updating configuration mid-stream

You can update prompting parameters during an active streaming session using UpdateConfiguration. The recommended approach is to dynamically update keyterms_prompt based on the current stage of your voice agent flow. If you know what answers or terminology to expect at a given point in the conversation, add those terms with keyterms_prompt so the model is primed to recognize them accurately.

For example, if your voice agent is currently asking for the caller’s name and date of birth, send the expected terms for that stage:

1 {
2   "type": "UpdateConfiguration",
3   "keyterms_prompt": [
4     "Kelly Byrne-Donoghue",
5     "date of birth",
6     "January",
7     "February"
8   ]
9 }

Then, when the conversation moves to a medical intake stage, update keyterms_prompt with the relevant domain terms:

1 {
2   "type": "UpdateConfiguration",
3   "keyterms_prompt": ["cardiology", "echocardiogram", "Dr. Patel", "metoprolol"]
4 }

Dynamically update keyterms_prompt for each stage of your voice agent flow. If you expect certain answers at a specific stage — names, addresses, account numbers, medical terms — proactively add those as keyterms so the model recognizes them accurately when the caller speaks them. This is the most effective way to improve recognition accuracy mid-stream.

See Keyterms prompting for more details on how keyterms_prompt works.

You can also update the prompt parameter mid-stream to adjust formatting or behavioral instructions:

1 {
2   "type": "UpdateConfiguration",
3   "prompt": "Transcribe verbatim. Rules: Always include punctuation in output. Use period/question mark ONLY for complete sentences. Use comma for mid-sentence pauses. Use no punctuation for incomplete trailing speech. Include spoken filler words, hesitations, repetitions and false starts when clearly spoken. Additional: This is a cardiology appointment."
4 }

Updating prompt mid-stream is useful for passing updated behavioral instructions into the STT stream. For domain-specific term recognition, prefer keyterms_prompt as described above.

Keyterms prompting

Use the keyterms_prompt parameter to boost recognition of specific names, brands, or domain terms. Behind the scenes, keyterms_prompt relies on the default prompt and appends your boosted words to it. Pass an array of terms you want the model to prioritize:

keyterms_prompt=["Keanu Reeves", "AssemblyAI", "Universal-2"]

You can set keyterms_prompt at connection time or update it mid-stream as the conversation progresses. For full details, see Keyterms prompting.

Prompt and Keyterms Prompt

You can use prompt and keyterms_prompt together in the same streaming request. When you use keyterms_prompt, your boosted words are appended to the default prompt (or your custom prompt if provided) automatically.

Streaming vs. async prompting capabilities

Many of the prompt capabilities available for async (pre-recorded) transcription — such as audio event tagging, speaker attribution, and labeling crosstalk — are designed for longer audio files and do not reliably work with streaming’s shorter audio segments (typically under 10 seconds). For streaming, focus your prompts on punctuation rules, verbatim transcription, and formatting instructions. Use keyterms_prompt for domain-specific term boosting.

Prompting is not an alternative method for speaker separation. To identify individual speakers in a streaming session, enable Streaming Diarization by setting speaker_labels: true in your connection parameters.

Looking for async prompting?

Prompting behavior differs between streaming and async (pre-recorded) use cases. This guide covers prompting for streaming (real-time audio). If you’re working with pre-recorded audio, see the Prompting Guide (Async).

Optimizing prompts for turn detection

Because Universal-3 Pro’s turn detection is punctuation-based, the way the model punctuates its output directly affects when turns end. Prompting gives you a lever to influence that punctuation behavior and improve turn detection accuracy.

Internal testing on 100 audio samples shows that well-crafted punctuation rules significantly reduce false positives (the model incorrectly signaling a turn end) while maintaining near-perfect recall (catching real turn ends):

Prompt	Accuracy	Recall	False positive rate
Default	88%	100%	24%
Prompt omitted	83%	100%	34%

The default prompt—used when no prompt is provided—dropped the false positive rate from 34% to 24% compared to no prompt at all. That translates to fewer false interruptions in production voice agent conversations.

Key takeaways for prompt design:

The default prompt already optimizes for turn detection. When the prompt is omitted entirely, Universal-3 Pro still achieves 83% accuracy thanks to its built-in punctuation behavior—but the built-in default prompt raises this to 88%. Try the default before customizing.
Concrete punctuation rules work best. The model responds better to specific formatting instructions (like “use period only for complete sentences”) than to abstract conversational framing. Keep custom prompts focused on punctuation behavior.
Recall is near-perfect across all approaches. The punctuation-based system rarely misses a real turn end. The main optimization lever is reducing false positives—cases where the model signals a turn end on an incomplete utterance.