input.format and output.format when you create or update it, or inline via session.update.
Most agents can leave this alone — the defaults are the highest quality. Change it mainly for telephony. For how to actually stream and play the audio bytes, see Stream audio.
Input and output encodings are independent and can differ. Both default to audio/pcm (24 kHz) if omitted.
| Encoding | Sample rate | Best for |
|---|---|---|
audio/pcm | 24,000 Hz | Default. Highest quality, ideal for browser and app use. |
audio/pcmu | 8,000 Hz | Telephony (G.711 μ-law). |
audio/pcma | 8,000 Hz | Telephony (G.711 A-law). |
audio/pcmu or audio/pcma (8 kHz) to match the phone network and avoid resampling. See Connect to Twilio for a full phone integration.
Set format.encoding under input and output. You can also pass an explicit sample_rate inside format:
| Field | Type | Required | Notes |
|---|---|---|---|
input.format.encoding | string | No | audio/pcm, audio/pcmu, or audio/pcma. Default audio/pcm. |
output.format.encoding | string | No | Same values as input. Default audio/pcm. |
format.sample_rate | integer | No | Sample rate in Hz. Determined by the encoding if omitted. |
Set the volume separately — see Volume. When configured inline via
session.update, output.format is immutable after session.ready and must be set on your first update.