Prompting and Keyterms

Use contextual prompting and keyterms to improve streaming transcription accuracy. Streaming transcription supports two complementary ways to give the model information about your audio:

Contextual prompting (prompt) — a natural-language description of what the audio is about: the domain, the scenario, or the full details of the conversation.
Keyterms prompting (keyterms_prompt) — an explicit list of terms you want the model to recognize accurately.

Both improve recognition accuracy for your use case. Neither changes the output format.

How prompting works

Universal-3.5 Pro is trained to use context about the audio — its domain, topic, or scenario — to better recognize the vocabulary that context makes likely. A call described as a cardiology consultation primes the model for medical terminology; a call described as an order-status check primes it for order IDs and product names. The transcription instruction itself is built in and managed by AssemblyAI. You don’t need to tell the model how to transcribe — verbatim behavior, punctuation, and formatting are already optimized for streaming and turn detection. The prompt parameter carries context about your audio, not instructions: formatting or behavioral commands in the prompt (such as punctuation rules) are not supported. The model is trained to stay grounded in the audio: context that turns out to be irrelevant or only partially applicable does not cause the model to insert words that weren’t spoken. This means you can safely send the same context with every session of a longer interaction — for example, the same call description for every segment of a conversation — even though only some of it applies at any given moment.

Start with no promptWe strongly recommend testing with no prompt first. Universal-3.5 Pro is optimized out of the box, and context helps most when your audio contains domain-specific vocabulary the model is getting wrong. Add context when you see those errors, starting with the broadest level that describes your use case.

Contextual prompting

Contextual prompts work at three levels of specificity. Use the least specific level that covers your use case, and add detail when your audio contains uncommon names or terms the model can’t otherwise know.

Level	Length	What it contains	Example
Domain	2–5 words	The domain only	`Medical consultation call.`
Scenario	5–15 words	What the conversation is about	`Cardiology consultation about chest pain symptoms.`
Detailed	20–50 words	Full description, including names, products, or identifiers	`Cardiology consultation between Dr. Smith and an elderly patient regarding recurring chest pain, ECG results, and medication adjustment for hypertension.`

Domain context tells the model what field the audio belongs to — medical, legal, technical support, food ordering. This is the safest starting point and is often enough to fix vocabulary errors. Scenario context describes the specific situation. This is the right level for most applications that know what kind of call is taking place — appointment booking, billing inquiry, delivery complaint. Detailed context describes everything your application already knows about the conversation: participant names, account or order identifiers, products, locations. This level is the most powerful when the audio contains proper nouns and identifiers — for example, a voice agent platform that already knows who is calling and why can pass that information so the model spells names and IDs correctly. Guidelines for writing contextual prompts:

Write plain, complete sentences that describe the audio — you are describing a recording, not commanding the model.
Keep it to one short block of text. Don’t pack lists of keywords into the contextual prompt — that’s what keyterms_prompt is for.
Specificity follows knowledge: only include details you actually know about the call. Wrong details won’t corrupt the transcript, but they don’t help either.

Accuracy impact

We benchmarked contextual prompting on 20,000 real voice-agent calls at each of the three levels. Accuracy improves monotonically with context specificity, and the gains are largest for the entities each level describes. Word error rate (WER) and entity error rate (EER) below are relative reductions versus no prompt.

Improvement vs. no prompt	Domain	Scenario	Detailed
Word error rate (WER)	−5%	−10%	−21%
Hallucinated words	−9%	−12%	−19%
Entity error rate — overall	−2%	−7%	−29%
Entity error rate — names	−5%	−16%	−49%
Entity error rate — places	−9%	−21%	−44%
Entity error rate — medical terms	−2%	−24%	−43%

Two takeaways for choosing a level:

Scenario context is the practical default. It needs only the kind of information any application already has — the type of call — and already cuts overall WER ~10% and entity errors on names and places by 16–21%.
Detailed context is the upper bound. When your application can supply specifics it already knows (caller name, account or order IDs, products), entity accuracy improves dramatically — names nearly halve. This is the most powerful option for voice agents with access to customer context.

Across all levels, turn detection is unaffected. Hallucinated and fabricated words go down with more context — describing the audio makes the model less likely to invent content, not more.

Specifying the language

Prefer the language_codes parameterLanguage selection is best supported through the language_codes connection parameter — pass the list of languages you expect, or a single-element list for a monolingual session — see Language selection. You can also specify the language in natural language as part of your contextual prompt.

State the language of the audio as part of your context. For example, knowing the language disambiguates the audio: the same sound could be transcribed “si” in a Spanish call but “C” in an English one.

Spanish customer support call about a billing inquiry.

For multilingual audio, name all expected languages:

Multilingual conversation in English, Spanish, and German.

Keyterms prompting

Use the keyterms_prompt parameter to boost recognition of specific names, brands, or domain terms. Pass an array of terms you want the model to prioritize:

keyterms_prompt=["Keanu Reeves", "AssemblyAI", "Universal-2"]

Keyterms prompting is the right tool when you have an explicit vocabulary list — contact names, product catalogs, medical terms — rather than a description of the conversation. Like contextual prompting, it is trained to be robust to terms that don’t end up being spoken, so you can pass your full list to every session.

Start with no keytermsWe strongly recommend starting with no keyterms_prompt and then adding terms as needed based on important words for your use case that you are consistently seeing the model struggle with.Including a large number of terms or common terms that are well represented in the training data could lead to overcorrections and hallucinations.

Limits:

You can include a maximum of 100 keyterms per session.
Each individual keyterm string must be 50 characters or less.
Keyterms longer than 50 characters are ignored; requests with more than 100 keyterms return an error.

Best practices:

Specify unique terminology. Include proper names, company names, technical terms, or vocabulary specific to your domain that might not be commonly recognized.
Exact spelling and capitalization. Provide keyterms with the precise spelling and capitalization you expect to see in the output transcript.
Avoid common words. Do not include single, common English words (e.g., “information”) as keyterms. The system is generally proficient with such words, and adding them as keyterms can be redundant.

Quickstart

Set prompt and keyterms_prompt as connection parameters when you open the WebSocket. The example below uses both together, with prompt describing the conversation and keyterms_prompt enumerating specific terms.

Python
Python SDK
Javascript
JavaScript SDK

import json

CONNECTION_PARAMS = {
    "sample_rate": 16000,
    "speech_model": "universal-3-5-pro",
    "prompt": "Cardiology consultation about chest pain symptoms.",
    "keyterms_prompt": json.dumps(["Dr. Smith", "AssemblyAI", "ECG"]),
}

client.connect(
    StreamingParameters(
        sample_rate=16000,
        speech_model="universal-3-5-pro",
        prompt="Cardiology consultation about chest pain symptoms.",
        keyterms_prompt=["Dr. Smith", "AssemblyAI", "ECG"],
    )
)

const CONNECTION_PARAMS = {
  sample_rate: 16000,
  speech_model: "universal-3-5-pro",
  prompt: "Cardiology consultation about chest pain symptoms.",
  keyterms_prompt: JSON.stringify(["Dr. Smith", "AssemblyAI", "ECG"]),
};

const transcriber = client.streaming.transcriber({
  sampleRate: 16_000,
  speechModel: "universal-3-5-pro",
  prompt: "Cardiology consultation about chest pain symptoms.",
  keytermsPrompt: ["Dr. Smith", "AssemblyAI", "ECG"],
});

Updating mid-stream

Both prompt and keyterms_prompt can be updated during an active streaming session without reconnecting. This is useful when your application learns more about the conversation as it progresses, or when a voice agent moves between conversation stages. Send an UpdateConfiguration message with the new values:

Python
Python SDK
Javascript
JavaScript SDK

# Update the contextual prompt as the conversation evolves
websocket.send('{"type": "UpdateConfiguration", "prompt": "Now collecting payment details."}')

# Replace or establish a new set of keyterms
websocket.send('{"type": "UpdateConfiguration", "keyterms_prompt": ["Universal-3"]}')

# Remove keyterms and reset context biasing
websocket.send('{"type": "UpdateConfiguration", "keyterms_prompt": []}')

# Update the contextual prompt as the conversation evolves
client.update_configuration(prompt="Now collecting payment details.")

# Replace or establish a new set of keyterms
client.update_configuration(keyterms_prompt=["Universal-3"])

# Remove keyterms and reset context biasing
client.update_configuration(keyterms_prompt=[])

// Update the contextual prompt as the conversation evolves
websocket.send(
  '{"type": "UpdateConfiguration", "prompt": "Now collecting payment details."}'
);

// Replace or establish a new set of keyterms
websocket.send(
  '{"type": "UpdateConfiguration", "keyterms_prompt": ["Universal-3"]}'
);

// Remove keyterms and reset context biasing
websocket.send('{"type": "UpdateConfiguration", "keyterms_prompt": []}');

// Update the contextual prompt as the conversation evolves
transcriber.updateConfiguration({ prompt: "Now collecting payment details." });

// Replace or establish a new set of keyterms
transcriber.updateConfiguration({ keytermsPrompt: ["Universal-3"] });

// Remove keyterms and reset context biasing
transcriber.updateConfiguration({ keytermsPrompt: [] });

Providing a new keyterms array completely replaces the existing set; sending an empty array [] removes all keyterms and resets context biasing to the default state. New values take effect immediately for subsequent audio processing. See Updating configuration mid-stream for the full list of parameters you can update mid-stream.

What prompting is not

Prompting does not control output formatting. Instructions in the prompt (punctuation rules, verbatim directives, formatting commands) are not supported — transcription behavior is managed internally and already optimized for streaming and turn detection.

Looking for async prompting?

Prompting behavior differs between streaming and async (pre-recorded) use cases. This guide covers prompting for streaming (real-time audio). If you’re working with pre-recorded audio, see the Prompting Guide (Async) and Keyterms prompting (Async).

Getting started

Features

API reference

Advanced

Integrations

Guides

Prompting and Keyterms

How prompting works

Contextual prompting

Accuracy impact

Specifying the language

Keyterms prompting

Quickstart

Updating mid-stream

What prompting is not

Looking for async prompting?

​How prompting works

​Contextual prompting

​Accuracy impact

​Specifying the language

​Keyterms prompting

​Quickstart

​Updating mid-stream

​What prompting is not

​Looking for async prompting?

How prompting works

Contextual prompting

Accuracy impact

Specifying the language

Keyterms prompting

Quickstart

Updating mid-stream

What prompting is not

Looking for async prompting?