Skip to main content
A voice agent’s audio configuration covers the input encoding (microphone), the output encoding (agent speech), and the playback volume. These fields live on the agent under input.format, output.format, and output.volume, set when you create or update it, or inline over the WebSocket via session.update. This page covers how to configure the encoding and volume. For how to actually stream and play the audio bytes, see Stream audio.

Encoding

The encoding determines the sample rate and bit depth. Input and output encodings are independent and can differ. Both default to audio/pcm (24 kHz) if omitted.
EncodingSample rateBest for
audio/pcm24,000 HzDefault. Highest quality, ideal for browser and app use.
audio/pcmu8,000 HzTelephony (G.711 μ-law).
audio/pcma8,000 HzTelephony (G.711 A-law).
For telephony, use audio/pcmu or audio/pcma (8 kHz) to match the phone network and avoid resampling. See Connect to Twilio for a full phone integration. Set format.encoding under input and output. You can also pass an explicit sample_rate inside format:
curl -X POST https://agents.assemblyai.com/v1/agents \
  -H "Authorization: $ASSEMBLYAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Support Assistant",
    "system_prompt": "You are a friendly support agent. Keep replies under two sentences.",
    "voice": { "voice_id": "ivy" },
    "input":  { "format": { "encoding": "audio/pcmu", "sample_rate": 8000 } },
    "output": { "format": { "encoding": "audio/pcmu", "sample_rate": 8000 } }
  }'
FieldTypeRequiredNotes
input.format.encodingstringNoaudio/pcm, audio/pcmu, or audio/pcma. Default audio/pcm.
output.format.encodingstringNoSame values as input. Default audio/pcm.
format.sample_rateintegerNoSample rate in Hz. Determined by the encoding if omitted.

Volume

Adjust the playback volume of the agent’s speech via output.volume. Accepts a number from 0 (silent) to 100 (loudest). If omitted, the voice plays at its native level.
curl -X PUT https://agents.assemblyai.com/v1/agents/$AGENT_ID \
  -H "Authorization: $ASSEMBLYAI_API_KEY" -H "Content-Type: application/json" \
  -d '{ "output": { "volume": 60 } }'
FieldTypeRequiredNotes
output.volumenumber | nullNo0 (silent) to 100 (loudest). null plays at native level.
When configured inline via session.update, output.voice and output.format are immutable after session.ready and must be set on your first update. output.volume is the exception: it can be changed mid-session, and the new value applies to subsequent reply.audio chunks.