For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
PlaygroundChangelogSign In
OverviewAPI ReferencePre-recorded STTStreaming STTVoice AgentsSpeech UnderstandingGuardrailsLLM GatewayFAQ
OverviewAPI ReferencePre-recorded STTStreaming STTVoice AgentsSpeech UnderstandingGuardrailsLLM GatewayFAQ
  • Getting started
    • Overview
    • Apply LLM Gateway to pre-recorded audio
    • Apply LLM Gateway to streaming audio
    • Specify fallback models
    • Prompt caching
    • Post-processing
    • Cloud endpoints & data residency
    • Troubleshooting
  • Use cases
    • Ask questions about your audio data
    • Build agentic workflows
    • Basic chat completions
    • Multi-turn conversations
    • Use tool calling with LLMs
    • Get structured JSON outputs
  • Guides
LogoLogo
PlaygroundChangelogSign In
On this page
  • What to log for support
  • Authentication errors (401 / 403)
  • Bad request (400)
  • Rate limit exceeded (429)
  • Model not found (404)
  • Server errors (5xx)
  • Streamed responses don’t appear
  • Unexpected output or quality issues
  • Contacting support
Getting started

Troubleshooting

Common issues and fixes when using the LLM Gateway.
Was this page helpful?
Previous

Ask Questions About Your Audio Transcripts

Next
Built with

What to log for support

Every LLM Gateway response includes a request_id — a unique identifier for that specific request. Log this ID for every call, not just when something goes wrong. When you reach out to support@assemblyai.com, including the request_id lets us find the exact request in our logs in seconds.

At minimum, capture the following for every request:

  • request_id from the response body
  • The model parameter used
  • The API region (US: llm-gateway.assemblyai.com, EU: llm-gateway.eu.assemblyai.com)
  • A timestamp for when the request was sent
  • The full HTTP status code and response body when a non-2xx response is returned

A minimal logging example:

Python
JavaScript
1import requests
2import time
3
4response = requests.post(
5 "https://llm-gateway.assemblyai.com/v1/chat/completions",
6 headers={"authorization": "<YOUR_API_KEY>"},
7 json={
8 "model": "claude-sonnet-4-6",
9 "messages": [{"role": "user", "content": "What is the capital of France?"}],
10 "max_tokens": 1000,
11 },
12)
13
14result = response.json()
15log_entry = {
16 "timestamp": time.time(),
17 "region": "us",
18 "model": "claude-sonnet-4-6",
19 "status_code": response.status_code,
20 "request_id": result.get("request_id"),
21 "error": result.get("error"),
22}
23print(log_entry)

Authentication errors (401 / 403)

Symptom: The API responds with 401 Unauthorized or 403 Forbidden.

1{
2 "error": {
3 "code": 401,
4 "message": "Unauthorized - Invalid or missing API key"
5 }
6}

Causes:

  • API key is missing, malformed, or expired.
  • API key is from a different account or region.
  • The Authorization header is misspelled (e.g. Authorisation or missing the header entirely).

Fixes:

  • Confirm your API key on the API Keys page.
  • Pass the key in the Authorization header — not as a query parameter and not prefixed with Bearer.
  • If you’re using EU data residency, make sure the key was generated for the EU region. See Cloud endpoints and data residency.

Bad request (400)

Symptom: The API responds with 400 Bad Request.

1{
2 "error": {
3 "code": 400,
4 "message": "Invalid request: missing required field 'model'"
5 }
6}

Causes:

  • A required field is missing (model, plus either messages or prompt).
  • The model value is not a supported model parameter — see Available models.
  • max_tokens is outside the valid range or exceeds the model’s context window.
  • A field is the wrong type (e.g. messages sent as a string instead of an array).

Fixes:

  • Validate your request payload against the Basic chat completions reference.
  • Echo the full error message — it includes the specific field that failed validation.

Rate limit exceeded (429)

Symptom: The API responds with 429 Too Many Requests.

Cause: You exceeded the per-model rate limit within a 60-second window. Each model has its own limit.

Fixes:

  • Read the rate limit headers on every response (X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset) to back off gracefully. See Rate limits for the full header reference.
  • Implement exponential backoff with jitter when you receive a 429.
  • Consider specifying fallback models so traffic spills over to a different model when the primary is rate-limited.
  • If you need a higher rate limit, contact support.

Model not found (404)

Symptom: The API responds with 404 Not Found and an error mentioning the model.

Causes:

  • The model value is misspelled or has been deprecated.
  • The model isn’t available in the region you’re calling. For example, OpenAI models are only available in the US region — see Cloud endpoints and data residency.

Fixes:

  • Double-check the exact model parameter against Available models.
  • If you need EU data residency, switch to an EU-supported model (most Anthropic Claude and Google Gemini models).

Server errors (5xx)

Symptom: The API responds with 500, 502, 503, or 504.

Causes:

  • Transient issues on AssemblyAI’s side or with the upstream model provider.
  • The upstream provider returned a timeout or unavailable response.

Fixes:

  • Retry with exponential backoff and jitter. Most 5xx errors are transient.
  • Check the AssemblyAI Status page for ongoing incidents.
  • If the error persists, contact support with the request_id, the model used, the timestamp, and the full error response body.

Streamed responses don’t appear

Symptom: You set stream: true but receive a single non-streamed response — or no response at all.

Causes:

  • Streaming is currently supported on OpenAI models only. Other providers ignore the stream flag and return a regular response.
  • The HTTP client isn’t reading the response body as a stream of server-sent events (SSE).

Fixes:

  • Confirm the model is from OpenAI. See Available models.
  • Use a client that reads SSE chunks (e.g. response.iter_lines() in Python requests, or the streaming fetch body reader in JavaScript). See Basic chat completions — Streamed responses.

Unexpected output or quality issues

Symptom: The model returns content you didn’t expect — wrong format, wrong language, hallucinations, or refusals.

Fixes:

  • Capture the full request payload (model, messages, parameters), the full response, and the request_id. Send all three to support@assemblyai.com — quality issues are difficult to diagnose without the exact prompt.
  • For structured output, use Structured outputs with a JSON schema rather than prompting for JSON in free text.
  • For malformed JSON, enable Post-processing to automatically repair responses.
  • Try a different model — quality varies. See the LMArena scores for a comparison.

Contacting support

If you’ve worked through the steps above and still need help, email support@assemblyai.com with:

  • The request_id from the failing response (or several, for intermittent issues)
  • The model parameter used
  • The API region (US or EU)
  • A timestamp for when the request was sent
  • The HTTP status code and full error response body
  • A minimal reproducible example of the request payload (with your API key redacted)