Skip to main content
Use contextual prompting and keyterms prompting to improve streaming transcription accuracy. Streaming transcription supports two complementary ways to give the model information about your audio:
  • Contextual prompting (prompt) — a natural-language description of what the audio is about: the domain, the scenario, or the full details of the conversation.
  • Keyterms prompting (keyterms_prompt) — an explicit list of terms you want the model to recognize accurately.
Both improve recognition accuracy for your use case. Neither changes the output format.

How prompting works

Universal-3 Pro is trained to use context about the audio — its domain, topic, or scenario — to better recognize the vocabulary that context makes likely. A call described as a cardiology consultation primes the model for medical terminology; a call described as an order-status check primes it for order IDs and product names. The transcription instruction itself is built in and managed by AssemblyAI. You don’t need to tell the model how to transcribe — verbatim behavior, punctuation, and formatting are already optimized for streaming and turn detection. The prompt parameter carries context about your audio, not instructions: formatting or behavioral commands in the prompt (such as punctuation rules) are not supported. The model is trained to stay grounded in the audio: context that turns out to be irrelevant or only partially applicable does not cause the model to insert words that weren’t spoken. This means you can safely send the same context with every session of a longer interaction — for example, the same call description for every segment of a conversation — even though only some of it applies at any given moment.
Start with no promptWe strongly recommend testing with no prompt first. Universal-3 Pro is optimized out of the box, and context helps most when your audio contains domain-specific vocabulary the model is getting wrong. Add context when you see those errors, starting with the broadest level that describes your use case.

Contextual prompting

Contextual prompts work at three levels of specificity. Use the least specific level that covers your use case, and add detail when your audio contains uncommon names or terms the model can’t otherwise know.
LevelLengthWhat it containsExample
Domain2–5 wordsThe domain onlyMedical consultation call.
Scenario5–15 wordsWhat the conversation is aboutCardiology consultation about chest pain symptoms.
Detailed20–50 wordsFull description, including names, products, or identifiersCardiology consultation between Dr. Smith and an elderly patient regarding recurring chest pain, ECG results, and medication adjustment for hypertension.
Domain context tells the model what field the audio belongs to — medical, legal, technical support, food ordering. This is the safest starting point and is often enough to fix vocabulary errors. Scenario context describes the specific situation. This is the right level for most applications that know what kind of call is taking place — appointment booking, billing inquiry, delivery complaint. Detailed context describes everything your application already knows about the conversation: participant names, account or order identifiers, products, locations. This level is the most powerful when the audio contains proper nouns and identifiers — for example, a voice agent platform that already knows who is calling and why can pass that information so the model spells names and IDs correctly. Guidelines for writing contextual prompts:
  • Write plain, complete sentences that describe the audio — you are describing a recording, not commanding the model.
  • Keep it to one short block of text. Don’t pack lists of keywords into the contextual prompt — that’s what keyterms_prompt is for.
  • Specificity follows knowledge: only include details you actually know about the call. Wrong details won’t corrupt the transcript, but they don’t help either.

Specifying the language

Prefer the language_code parameterLanguage selection is best supported through the language_code connection parameter — see Language selection. You can also specify the language in natural language as part of your contextual prompt, as described below.
State the language of the audio as part of your context. For example, knowing the language disambiguates the audio: the same sound could be transcribed “si” in a Spanish call but “C” in an English one.
Spanish customer support call about a billing inquiry.
For multilingual audio, name all expected languages:
Multilingual conversation in English, Spanish, and German.

Keyterms prompting

Use the keyterms_prompt parameter to boost recognition of specific names, brands, or domain terms. Pass an array of terms you want the model to prioritize:
keyterms_prompt=["Keanu Reeves", "AssemblyAI", "Universal-2"]
Keyterms prompting is the right tool when you have an explicit vocabulary list — contact names, product catalogs, medical terms — rather than a description of the conversation. Like contextual prompting, it is trained to be robust to terms that don’t end up being spoken, so you can pass your full list to every session.
Start with no keytermsWe strongly recommend starting with no keyterms_prompt and then adding terms as needed based on important words for your use case that you are consistently seeing the model struggle with.Including a large number of terms or common terms that are well represented in the training data could lead to overcorrections and hallucinations.
Limits:
  • You can include a maximum of 100 keyterms per session.
  • Each individual keyterm string must be 50 characters or less.
  • Keyterms longer than 50 characters are ignored; requests with more than 100 keyterms return an error.
Best practices:
  • Specify unique terminology. Include proper names, company names, technical terms, or vocabulary specific to your domain that might not be commonly recognized.
  • Exact spelling and capitalization. Provide keyterms with the precise spelling and capitalization you expect to see in the output transcript.
  • Avoid common words. Do not include single, common English words (e.g., “information”) as keyterms. The system is generally proficient with such words, and adding them as keyterms can be redundant.

Quickstart

Set prompt and keyterms_prompt as connection parameters when you open the WebSocket. The example below uses both together, with prompt describing the conversation and keyterms_prompt enumerating specific terms.
import json

CONNECTION_PARAMS = {
    "sample_rate": 16000,
    "speech_model": "u3-rt-pro",
    "prompt": "Cardiology consultation about chest pain symptoms.",
    "keyterms_prompt": json.dumps(["Dr. Smith", "AssemblyAI", "ECG"]),
}

Updating mid-stream

Both prompt and keyterms_prompt can be updated during an active streaming session without reconnecting. This is useful when your application learns more about the conversation as it progresses, or when a voice agent moves between conversation stages. Send an UpdateConfiguration message with the new values:
# Update the contextual prompt as the conversation evolves
websocket.send('{"type": "UpdateConfiguration", "prompt": "Now collecting payment details."}')

# Replace or establish a new set of keyterms
websocket.send('{"type": "UpdateConfiguration", "keyterms_prompt": ["Universal-3"]}')

# Remove keyterms and reset context biasing
websocket.send('{"type": "UpdateConfiguration", "keyterms_prompt": []}')
Providing a new keyterms array completely replaces the existing set; sending an empty array [] removes all keyterms and resets context biasing to the default state. New values take effect immediately for subsequent audio processing. See Updating configuration mid-stream for the full list of parameters you can update mid-stream.

What prompting is not

  • Prompting does not control output formatting. Instructions in the prompt (punctuation rules, verbatim directives, formatting commands) are not supported — transcription behavior is managed internally and already optimized for streaming and turn detection.

Looking for async prompting?

Prompting behavior differs between streaming and async (pre-recorded) use cases. This guide covers prompting for streaming (real-time audio). If you’re working with pre-recorded audio, see the Prompting Guide (Async) and Keyterms prompting (Async).