Skip to main content
Improve voice agent outcomes by giving Universal-3.5 Pro the full conversational context. Pass your voice agent’s spoken replies into the session via the agent_context parameter and the model knows both sides of the dialog when transcribing the next user turn. With the agent’s side of the conversation in context, Universal-3.5 Pro can anticipate the kind of answer to expect, sharpen entity recognition, and disambiguate words that sound similar. For example, after your agent asks "What's your email address?" the model might transcribe the reply as "user at assemblyai dot com". With that question passed in via agent_context, the model knows an email is coming and produces "user@assemblyai.com". The user side comes along for free: Universal-3.5 Pro Streaming automatically carries prior STT-finalized turns forward as context, so you don’t need to configure anything for the user half of the conversation. agent_context is what fills in the agent half.
Set agent_context at connection time and after each agent replyPass agent_context as a connection-time query parameter to seed the model with your agent’s opening greeting, then send UpdateConfiguration mid-stream after each subsequent agent reply. See Passing your agent’s reply as context for the full pattern.

How it works

During a streaming session, Universal-3.5 Pro Streaming keeps a short memory of the conversation. Two sources feed that memory: agent_context values you push in (the agent half) and prior finalized user turns (the user half, carried forward automatically). The model uses both when transcribing the next user turn. This means:
  • Context is per-session. Closing the WebSocket clears it, and a new session starts fresh.
  • Only agent_context values and finalized user turns (end_of_turn: true) are carried forward, not partials.

Defaults

BehaviorDefault
Auto-carry of prior user turnsEnabled
Number of prior entries carried3
Maximum context size~1500 characters
Older entries are dropped first as new ones come in, so the most recent conversation is always preserved.

Passing your agent’s reply as context

Pass your voice agent’s spoken reply (what your TTS just said) via the agent_context parameter. There are two ways to set it:
  • At connection time: pass agent_context as a query parameter on the WebSocket URL. Use this to seed the model with your agent’s opening greeting before the user has said anything.
  • Mid-stream: send an UpdateConfiguration message with the agent_context field after each subsequent agent reply.
Both forms let the model know the question the user is about to answer, which is especially important for short replies ("yes", "7pm", "that's all"). The user side is handled for you: prior STT-finalized turns are automatically carried forward as context, so you only need to manage the agent half explicitly.

Setting an opening greeting at connection time

When you open the WebSocket, pass agent_context alongside your other connection parameters. The first user turn will be transcribed with the greeting already in the model’s context.
params = {
    "agent_context": "Welcome to the Krusty Krab, home of the Krabby Patty, may I take your order?",
    "sample_rate": 16000,
    "speech_model": "universal-3-5-pro",
}

Updating agent context mid-stream

A typical voice agent loop looks like this:
  1. User speaks → Universal-3.5 Pro Streaming emits a final turn.
  2. Your agent runs an LLM step and generates a reply.
  3. Your TTS speaks the reply to the user.
  4. User responds → next turn.
During step 3, send the agent’s reply text to the streaming session so the model knows what question the user will be answering next turn.
ws.send(json.dumps({
    "type": "UpdateConfiguration",
    "agent_context": "Sure, what date would you like to book?",
}))

Limits

  • Universal-3 Pro and Universal-3.5 Pro only. agent_context is supported on speech_model: "universal-3-5-pro" and "u3-rt-pro". If you set it at connection time on any other model, the session is rejected; if you send it mid-stream on another model, it’s stripped with a warning.
  • Per-value cap: ~1500 characters. Trim long agent replies down to the substantive question before sending.

When conversation context helps most

agent_context has the largest impact on:
  • Voice agents: short user responses to agent questions ("yes", "no", "that's all", dates, times, single names).
  • Spelled-out entities: emails, account IDs, addresses, and similar inputs read aloud after the agent has just asked for them. Setting agent_context to the agent’s prompt (e.g. "What's your email address?") primes the model for what’s coming.
  • Disambiguation: words that sound similar but only one fits the conversation ("fleas" vs "please", "to" vs "two" vs "too").
  • Entity recall: names, products, or terms that were established earlier in the conversation.
It has less impact on long, self-contained turns where the audio already provides enough context on its own.

Interactions with other parameters

prompt. Conversation context layers on top of the model’s built-in transcription instruction and any contextual prompt you provide. The two stack cleanly. keyterms_prompt. Use keyterms_prompt alongside agent_context as needed; the two don’t conflict. Multilingual sessions. Carrying prior turns biases the model toward the languages already seen in the conversation. For sessions that mix three or more languages, this can occasionally push the model toward translating rather than transcribing. If you see drift, set a single transcription language in your prompt (see Specifying the transcription language).