Troubleshooting
Agent interrupts itself (echo / feedback loop)
Symptom: Every agent response ends with (interrupted) after about one second. The transcript shows the agent’s own words echoed back as user speech.
Cause: The agent’s TTS audio plays through speakers and loops back into the microphone. Terminal apps (Python with sounddevice) don’t get OS-level acoustic echo cancellation (AEC).
Fixes:
- Use headphones — the simplest fix.
- Switch to the browser — browsers provide AEC automatically through
getUserMedia({ audio: { echoCancellation: true } }). See Browser integration.
Wrong sample rate
Symptom: Audio sounds garbled, pitched up/down, or plays at the wrong speed.
Cause: The Voice Agent API expects PCM16 mono at exactly 24,000 Hz. If your mic captures at 48 kHz or your playback device runs at a different rate, the audio will be misinterpreted.
Fixes:
- Python: Set
samplerate=24000on bothsd.InputStreamandsd.OutputStream. - Browser: Create the AudioContext with
new AudioContext({ sampleRate: 24000 })— this avoids manual resampling entirely. See the Browser quickstart. - If you can’t control the device sample rate, resample to/from 24 kHz before encoding/decoding.
Microphone permission denied
Symptom: NotAllowedError in the browser or PortAudioError in Python.
Fixes:
- Browser: The page must be served over HTTPS (or
localhost). Check that the user granted microphone permission in the browser prompt. - macOS: Go to System Settings → Privacy & Security → Microphone and enable access for your terminal app or browser.
- Linux: Check that your user has access to the audio device (
ls -la /dev/snd/). You may need to add your user to theaudiogroup.
Firewall blocking WebSocket connection
Symptom: WebSocket connection hangs or fails with a timeout.
Cause: Corporate firewalls or proxies may block outbound WSS (WebSocket Secure) connections on port 443.
Fixes:
- Verify that
wss://agents.assemblyai.comis reachable from your network. - If behind a corporate proxy, configure your WebSocket client to use the proxy.
- Test from a different network to rule out firewall issues.
Malformed base64 in input.audio
Symptom: session.error with code invalid_audio.
Cause: The audio field in input.audio failed base64 decode or PCM conversion. Common mistakes include sending raw binary instead of base64, or encoding audio in the wrong format (e.g., WAV headers included, float32 instead of int16).
Fixes:
- Verify you’re encoding raw PCM16 bytes, not a WAV or other container format.
- Check that the data is base64-encoded:
base64.b64encode(pcm_bytes).decode()in Python, orbtoa(String.fromCharCode(...new Uint8Array(buffer)))in JavaScript. - Confirm the audio is 16-bit signed integer (little-endian), mono, at 24 kHz.
If the message itself is malformed (bad JSON, missing type, or missing audio field), you’ll get invalid_format instead. See the error codes reference for the full list.
Token expired or invalid credentials
Symptom: WebSocket closes immediately with close code 1008 and an UNAUTHORIZED error, or with code 1006 in browsers (no body visible). No session.ready event is received.
Cause: The token or API key is missing, expired, or invalid. The server sends UNAUTHORIZED (close code 1008) before the session is established.
Fixes:
- Fetch a fresh token immediately before each connection attempt — don’t pre-fetch and store them.
- Keep
expires_in_secondsat 60–300 seconds for a good balance between security and usability. - If using
session.resume, remember that each new WebSocket connection needs a new token.
See Token expiry and failure modes for more detail.
Session resume fails
Symptom: session.error with code session_not_found, session_forbidden, or session_expired after sending session.resume.
Causes:
session_not_found— thesession_idis unknown or the 30-second grace window after disconnection has expired.session_forbidden— thesession_idbelongs to a different account.session_expired— the session’s TTL elapsed during the grace window.
Fix: Catch these error codes and start a fresh session without session.resume. See the session.resume example.