Conversation Context

Improve voice agent outcomes by giving Universal-3.5 Pro the full conversational context. Pass your voice agent’s spoken replies into the session via the agent_context parameter and the model knows both sides of the dialog when transcribing the next user turn. With the agent’s side of the conversation in context, Universal-3.5 Pro can anticipate the kind of answer to expect, sharpen entity recognition, and disambiguate words that sound similar. For example, after your agent asks "What's your email address?" the model might transcribe the reply as "user at assemblyai dot com". With that question passed in via agent_context, the model knows an email is coming and produces "user@assemblyai.com". The user side comes along for free: Universal-3.5 Pro Streaming automatically carries prior STT-finalized turns forward as context, so you don’t need to configure anything for the user half of the conversation. agent_context is what fills in the agent half.

Set agent_context at connection time and after each agent replyPass agent_context as a connection-time query parameter to seed the model with your agent’s opening greeting, then send UpdateConfiguration mid-stream after each subsequent agent reply. See Passing your agent’s reply as context for the full pattern.

How it works

During a streaming session, Universal-3.5 Pro Streaming keeps a short memory of the conversation. Two sources feed that memory: agent_context values you push in (the agent half) and prior finalized user turns (the user half, carried forward automatically). The model uses both when transcribing the next user turn. This means:

Context is per-session. Closing the WebSocket clears it, and a new session starts fresh.
Only agent_context values and finalized user turns (end_of_turn: true) are carried forward, not partials.

Defaults

Behavior	Default
Auto-carry of prior user turns	Enabled
Number of prior entries carried	3
Maximum context size	1750 characters

Older entries are dropped first as new ones come in, so the most recent conversation is always preserved.

Passing your agent’s reply as context

Pass your voice agent’s spoken reply (what your TTS just said) via the agent_context parameter. There are two ways to set it:

At connection time: pass agent_context as a query parameter on the WebSocket URL. Use this to seed the model with your agent’s opening greeting before the user has said anything.
Mid-stream: send an UpdateConfiguration message with the agent_context field after each subsequent agent reply.

Both forms let the model know the question the user is about to answer, which is especially important for short replies ("yes", "7pm", "that's all"). The user side is handled for you: prior STT-finalized turns are automatically carried forward as context, so you only need to manage the agent half explicitly.

Setting an opening greeting at connection time

When you open the WebSocket, pass agent_context alongside your other connection parameters. The first user turn will be transcribed with the greeting already in the model’s context.

Python
Python SDK
JavaScript
JavaScript SDK

params = {
    "agent_context": "Welcome to the Krusty Krab, home of the Krabby Patty, may I take your order?",
    "sample_rate": 16000,
    "speech_model": "universal-3-5-pro",
}

client.connect(
    StreamingParameters(
        sample_rate=16000,
        speech_model="universal-3-5-pro",
        agent_context="Welcome to the Krusty Krab, home of the Krabby Patty, may I take your order?",
    )
)

const params = {
  sample_rate: 16000,
  speech_model: "universal-3-5-pro",
  agent_context:
    "Welcome to the Krusty Krab, home of the Krabby Patty, may I take your order?",
};

const transcriber = client.streaming.transcriber({
  sampleRate: 16_000,
  speechModel: "universal-3-5-pro",
  agentContext:
    "Welcome to the Krusty Krab, home of the Krabby Patty, may I take your order?",
});

await transcriber.connect();

Updating agent context mid-stream

A typical voice agent loop looks like this:

User speaks → Universal-3.5 Pro Streaming emits a final turn.
Your agent runs an LLM step and generates a reply.
Your TTS speaks the reply to the user.
User responds → next turn.

During step 3, send the agent’s reply text to the streaming session so the model knows what question the user will be answering next turn.

Python
Python SDK
JavaScript
JavaScript SDK

ws.send(json.dumps({
    "type": "UpdateConfiguration",
    "agent_context": "Sure, what date would you like to book?",
}))

client.update_configuration(
    agent_context="Sure, what date would you like to book?",
)

ws.send(JSON.stringify({
  type: "UpdateConfiguration",
  agent_context: "Sure, what date would you like to book?",
}));

transcriber.updateConfiguration({
  agent_context: "Sure, what date would you like to book?",
});

Limits

Universal-3.5 Pro only. agent_context is supported on speech_model: "universal-3-5-pro". If you set it at connection time on any other model, the session is rejected; if you send it mid-stream on another model, it’s stripped with a warning.
Per-value cap: 1750 characters. Trim long agent replies down to the substantive question before sending.

When conversation context helps most

agent_context has the largest impact on:

Voice agents: short user responses to agent questions ("yes", "no", "that's all", dates, times, single names).
Spelled-out entities: emails, account IDs, addresses, and similar inputs read aloud after the agent has just asked for them. Setting agent_context to the agent’s prompt (e.g. "What's your email address?") primes the model for what’s coming.
Disambiguation: words that sound similar but only one fits the conversation ("fleas" vs "please", "to" vs "two" vs "too").
Entity recall: names, products, or terms that were established earlier in the conversation.

It has less impact on long, self-contained turns where the audio already provides enough context on its own.

Interactions with other parameters

prompt. Conversation context layers on top of the model’s built-in transcription instruction and any contextual prompt you provide. The two stack cleanly. keyterms_prompt. Use keyterms_prompt alongside agent_context as needed; the two don’t conflict. Multilingual sessions. Carrying prior turns biases the model toward the languages already seen in the conversation. For sessions that mix three or more languages, this can occasionally push the model toward translating rather than transcribing. If you see drift, set a single transcription language in your prompt (see Specifying the transcription language).

​How it works

​Defaults

​Passing your agent’s reply as context

​Setting an opening greeting at connection time

​Updating agent context mid-stream

​Limits

​When conversation context helps most

​Interactions with other parameters

How it works

Defaults

Passing your agent’s reply as context

Setting an opening greeting at connection time

Updating agent context mid-stream

Limits

When conversation context helps most

Interactions with other parameters