agent_context parameter and the model knows both sides of the dialog when transcribing the next user turn.
With the agent’s side of the conversation in context, Universal-3.5 Pro can anticipate the kind of answer to expect, sharpen entity recognition, and disambiguate words that sound similar. For example, after your agent asks "What's your email address?" the model might transcribe the reply as "user at assemblyai dot com". With that question passed in via agent_context, the model knows an email is coming and produces "user@assemblyai.com".
The user side comes along for free: Universal-3.5 Pro Streaming automatically carries prior STT-finalized turns forward as context, so you don’t need to configure anything for the user half of the conversation. agent_context is what fills in the agent half.
How it works
During a streaming session, Universal-3.5 Pro Streaming keeps a short memory of the conversation. Two sources feed that memory:agent_context values you push in (the agent half) and prior finalized user turns (the user half, carried forward automatically). The model uses both when transcribing the next user turn.
This means:
- Context is per-session. Closing the WebSocket clears it, and a new session starts fresh.
- Only
agent_contextvalues and finalized user turns (end_of_turn: true) are carried forward, not partials.
Defaults
| Behavior | Default |
|---|---|
| Auto-carry of prior user turns | Enabled |
| Number of prior entries carried | 3 |
| Maximum context size | ~1500 characters |
Passing your agent’s reply as context
Pass your voice agent’s spoken reply (what your TTS just said) via theagent_context parameter. There are two ways to set it:
- At connection time: pass
agent_contextas a query parameter on the WebSocket URL. Use this to seed the model with your agent’s opening greeting before the user has said anything. - Mid-stream: send an
UpdateConfigurationmessage with theagent_contextfield after each subsequent agent reply.
"yes", "7pm", "that's all").
The user side is handled for you: prior STT-finalized turns are automatically carried forward as context, so you only need to manage the agent half explicitly.
Setting an opening greeting at connection time
When you open the WebSocket, passagent_context alongside your other connection parameters. The first user turn will be transcribed with the greeting already in the model’s context.
- Python
- Python SDK
- JavaScript
- JavaScript SDK
Updating agent context mid-stream
A typical voice agent loop looks like this:- User speaks → Universal-3.5 Pro Streaming emits a final turn.
- Your agent runs an LLM step and generates a reply.
- Your TTS speaks the reply to the user.
- User responds → next turn.
- Python
- Python SDK
- JavaScript
- JavaScript SDK
Limits
- Universal-3 Pro and Universal-3.5 Pro only.
agent_contextis supported onspeech_model: "universal-3-5-pro"and"u3-rt-pro". If you set it at connection time on any other model, the session is rejected; if you send it mid-stream on another model, it’s stripped with a warning. - Per-value cap: ~1500 characters. Trim long agent replies down to the substantive question before sending.
When conversation context helps most
agent_context has the largest impact on:
- Voice agents: short user responses to agent questions (
"yes","no","that's all", dates, times, single names). - Spelled-out entities: emails, account IDs, addresses, and similar inputs read aloud after the agent has just asked for them. Setting
agent_contextto the agent’s prompt (e.g."What's your email address?") primes the model for what’s coming. - Disambiguation: words that sound similar but only one fits the conversation (
"fleas"vs"please","to"vs"two"vs"too"). - Entity recall: names, products, or terms that were established earlier in the conversation.
Interactions with other parameters
prompt. Conversation context layers on top of the model’s built-in transcription instruction and any contextual prompt you provide. The two stack cleanly.
keyterms_prompt. Use keyterms_prompt alongside agent_context as needed; the two don’t conflict.
Multilingual sessions. Carrying prior turns biases the model toward the languages already seen in the conversation. For sessions that mix three or more languages, this can occasionally push the model toward translating rather than transcribing. If you see drift, set a single transcription language in your prompt (see Specifying the transcription language).