Prompting Guide (Async)
Prompting Guide (Async)
Start with no prompt
The default prompt outperforms most custom prompts. Omit the prompt parameter first — Universal-3 Pro automatically applies a built-in default that is already optimized for accuracy across a wide range of audio.
If the default isn’t a fit, start from one of the recommended prompts, then test against a representative set of your own audio (we suggest at least 25 files — see Evaluating your prompts). Layer in one additional instruction at a time. Do not start from scratch.
How prompting works
Universal-3 Pro is a Speech-augmented Large Language Model (SpeechLLM): a multi-modal LLM with an audio encoder and LLM decoder that processes speech, audio, and text inputs in the same workflow.
Think of SpeechLLM prompting as selecting modes and knobs, not open-ended instruction following. The model is trained primarily to transcribe, then fine-tuned to respond to common transcription instructions for style, speakers, and speech events. It responds best to explicit formatting rules and behavioral instructions (e.g., “include all filler words”, “use periods only for complete sentences”). Domain context like “this is a cardiology appointment” only helps when paired with specific instructions on how to transcribe.
If you know your terms, use keyterms — not the prompt
If you already know the specific names, brands, drug names, acronyms, or jargon that will appear in your audio, use keyterms prompting instead of a free-form prompt. The keyterms_prompt parameter is optimized for term boosting and produces more reliable results than describing the same terms in plain language. Reach for free-form prompts when you want to control style or behavior — not when you want to boost specific words.
What prompts can do
Recommended prompts
These three prompts are battle-tested and the strongest starting points. Use one as your base and tweak from there — don’t start from scratch.
Best all around (default)
This is also the current built-in default prompt — when you omit the prompt parameter, this is what Universal-3 Pro uses. You don’t need to set it explicitly; it’s shown here so you can build off it.
Verbatim with multilingual support
This prompt maximizes speech pattern capture, preserves code-switching, and tells the model to always attempt transcription even on difficult audio. The trade-off is that the model may occasionally hallucinate disfluencies or language switches that don’t exist in the audio.
Handling unclear audio with [unclear]
This prompt flags uncertain segments rather than forcing the model to guess. It is one of the strongest tools for avoiding hallucinations on unclear audio.
Result:
- Hallucinations are materially reduced — the model doesn’t force incorrect guesses on uncertain audio.
- Uncertain sections are explicitly flagged as
[unclear], surfacing exactly where audio quality is insufficient. - Clearly audible speech is still preserved.
Capabilities reference
Each capability is a “knob” you can turn. Each section below shows one audio demo with before/after output and one recommended prompt. Layer capabilities in one at a time so you can measure the impact of each — conflicting instructions degrade output, so keep your prompt focused.
Verbatim transcription and disfluencies
Preserves natural speech patterns including filler words, false starts, repetitions, and self-corrections. Reliability: High.
Without prompt:
With prompt, the model captures filler words like “uh” and false starts like “we, we, we’re friends”:
Native code switching
Handles audio where speakers switch between languages. Reliability: High.
Universal-3 Pro is natively multilingual for English, Spanish, French, German, Italian, and Portuguese. For audio in other languages, set language_detection: true so files are routed to the right model. Without this, unsupported languages may be marked [FOREIGN LANGUAGE].
Output style and formatting
Controls punctuation, capitalization, and readability without changing words. Reliability: High.
Without prompt:
With prompt, the model uses punctuation to reflect the speaker’s emotional state:
Context aware clues
Helps with jargon, names, and domain expectations from the audio file. Reliability: Medium.
Without prompt:
With prompt, adding clinical history evaluation as a context clue corrects spelling of “Glicoside” to “Glycoside”:
Context alone does not tell the model how to transcribe. Pair domain context with a specific instruction. This is a doctor-patient visit is context; prioritize accurately transcribing medications and diseases is the actionable instruction.
Entity accuracy and spelling
Improves accuracy for proper nouns, brands, technical terms, and domain vocabulary. Reliability: Medium. If you already know the exact terms you want boosted, use keyterms prompting instead of describing them in your prompt.
Without prompt:
With prompt, the model corrects the misrecognition of “Anktiva” (transcribed as “Entiva” without context):
Describe the pattern of entities you want corrected, not the specific errors — listing specific spellings often causes the model to hallucinate them. See What to avoid.
What works / what to avoid
What works
What to avoid
Evaluating your prompts
Prompts only work on your audio — universal best practices don’t transfer reliably across use cases. Before settling on a prompt, run it against a representative dataset.
The workflow:
- Build an evaluation set of at least 25 audio files that reflect the speakers, accents, audio quality, and vocabulary you expect in production. See Evaluate model accuracy for the full methodology.
- Transcribe each file with no prompt to establish a baseline.
- Try the Best all around and
[unclear]recommended prompts and compare. - Layer in one capability instruction at a time and re-measure.
Watch out for misleading WER
Universal-3 Pro frequently outperforms human transcribers. If your word error rate (WER) shows unexpected insertions, listen to the audio at those timestamps before assuming the model is wrong — many “errors” are the model catching audio a human missed. Similarly, substitutions like “offsite” vs. “off site” or “alright” vs. “all right” inflate WER without representing real errors.
Tips:
- Use the
[unclear]tag in your evaluation prompt so the model doesn’t guess where a human transcriber would also miss. This improves WER alignment. - Review insertions manually by listening at flagged timestamps.
- Consider Semantic WER over normalized WER — it won’t penalize formatting-level differences that aren’t real errors.
Generate a starting prompt with AI
If the recommended prompts above aren’t a fit for your audio, use the generator below to produce a starting prompt. It opens your preferred AI assistant with a pre-loaded brief built from this guide — the capability knobs, the keyterms-vs-prompt routing, the positive-language rule, and the “start with fewer instructions, add one at a time” framing. The output is a starting point, not a final prompt. Test it against your evaluation set using the workflow above before settling on it.
Click a button to open your preferred AI assistant with your transcript sample and instructions pre-loaded. The AI will generate an optimized prompt based on our prompt engineering best practices.
System prompt history
The current default prompt is shown above under Best all around (default). Prior defaults are kept here for changelog transparency.
Prior system prompt (April 15, 2026 – April 21, 2026)
Prior system prompt (February 25, 2026 – April 15, 2026)
Prior system prompt (February 20, 2026 – February 25, 2026)
Prior system prompt (before February 20, 2026)
Need help?
Prompting Universal-3 Pro is instructional, not open-ended — use the knobs above and test against your own data. If you’d like help building or optimizing a prompt for your audio, our team can help: open a live chat or email us via the widget in the bottom-right corner (contact info).