LLM Gateway Overview

Supported regions

US & EU

Overview

AssemblyAI’s LLM Gateway is a unified, Open AI sdk compatible, interface that allows you to connect across multiple LLMs through a single API with access to automatic fallbacks. Use as a standalone service or make use of our easy integration with AssemblyAI speech-to-text products. We focus on text to text LLM providers with regional compliance, security, and ZDR. Additionally, we have optional but tight integration with other AssemblyAI products for a more seamless experience.

Quickstart

Python
JavaScript
Py OpenAI SDK

import requests

headers = {
  "authorization": "<YOUR_API_KEY>"
}

response = requests.post(
    "https://llm-gateway.assemblyai.com/v1/chat/completions",
    headers = headers,
    json = {
        "model": "claude-sonnet-4-6",
        "messages": [
            {"role": "user", "content": "What is the capital of France?"}
        ],
        "max_tokens": 1000
    }
)

result = response.json()
print(result["choices"][0]["message"]["content"])

const response = await fetch(
  "https://llm-gateway.assemblyai.com/v1/chat/completions",
  {
    method: "POST",
    headers: {
      authorization: "<YOUR_API_KEY>",
      "content-type": "application/json",
    },
    body: JSON.stringify({
      model: "claude-sonnet-4-6",
      messages: [{ role: "user", content: "What is the capital of France?" }],
      max_tokens: 1000,
    }),
  }
);

const result = await response.json();
console.log(result.choices[0].message.content);

from openai import OpenAI
import openai.types.chat.chat_completion as types

client = OpenAI(
    base_url="https://llm-gateway.assemblyai.com/v1",
    api_key="<YOUR_API_KEY>",
)

import json

messages = [{"role": "user", "content": "What is the capital of france"}]
response = client.chat.completions.create(model="claude-sonnet-4.6", messages=messages)
print(response.choices[0].message.content)

Simply change the model parameter to use any of the available models listed in the Available models page.

Want to compare models side-by-side? Try the Model Comparison Tool, a Lovable application, to test different LLM models and see how they perform.

Non-standard LLM Enhancements

Here are some bonus features LLM Gateway provides over other routing options

Automatic retries
- 5xx responses will automatically be retried on our side by default
- configure with fallback_config
Model Fallbacks
- Configure alternative models/providers in case a model isn’t working
- Even change the prompt and/or params to better suit a different model
- See docs
Automatically apply LLM Gateway requests on each real-time STT turn for reduced E2E latency
- see docs
Post Processing
- Automatically apply post processing to the LLM response
- Repair broken JSON in tool calls and structured output with json-repair
- see docs
Inject Transcript Text
- Call LLM Gateway with a transcript_id and we will handle fetching and adding the transcript text to the LLM message.
- Useful if you don’t want to store the transcript.
- see docs

Regional compliance

Endpoint	Base URL
US (default)	`https://llm-gateway.assemblyai.com/v1/chat/completions`
EU	`https://llm-gateway.eu.assemblyai.com/v1/chat/completions`

The LLM Gateway is available in both US and EU regions. Use the EU endpoint to ensure your data stays within the European Union. Currently, Anthropic Claude and most Google Gemini models are supported in the EU (except where otherwise noted). OpenAI models are only available in the US region. See Cloud Endpoints and Data Residency for more details.

The LLM Gateway provides access to 25+ models across major AI providers with support for:

Basic Chat Completions - Simple request/response interactions
Streamed Responses - Stream output as it’s generated (OpenAI models)
Multi-turn Conversations - Maintain context across multiple exchanges
Structured Outputs - Constrain responses to a specific JSON schema
Tool/Function Calling - Enable models to execute custom functions
Agentic Workflows - Multi-step reasoning with automatic tool chaining
Unified Interface - One API for Claude, GPT, Gemini, Qwen, Kimi, and more
Post-processing - Automatically repair malformed JSON responses with built-in JSON repair

Global model routing

If your request/needs do not require region locked routing for either latency or compliance, you can opt to set the model_region to global in the api request body for a discount. See model pricing for which models have global discounts.

Rate limits

LLM Gateway is rate limited per model, measured as requests within a 60-second window.

Account type	Rate limit (requests/min, per model)
Free	Not available
Paid	30

Need a higher rate limit?If you need a higher rate limit, contact our support team.

Logging and troubleshooting

Every LLM Gateway response includes a request_id field — a unique identifier for that request. Persist it (along with the model, the API region, and a timestamp) for every call you make, not just when something goes wrong. If you contact support@assemblyai.com about a specific request (latency spikes, unexpected output, rate-limit errors, content moderation surprises), this ID lets us locate the exact request in our logs immediately. We recommend logging, at minimum:

request_id from the response body
The model parameter used
The API region (US: llm-gateway.assemblyai.com, EU: llm-gateway.eu.assemblyai.com)
A timestamp for when the request was sent
The full error response body when a non-2xx status code is returned

For details on debugging specific status codes (400/401/403/429/5xx) and what information to include when filing a support request, see the Troubleshooting page.

Next steps

Basic Chat Completions - Learn how to send simple messages and receive responses
Multi-turn Conversations - Maintain context across multiple exchanges
Structured Outputs - Constrain model responses to follow a specific JSON schema
Tool Calling - Enable models to execute custom functions
Agentic Workflows - Build multi-step reasoning applications
Post-processing - Automatically repair malformed JSON in model responses
Prompt Caching - Save money on similar calls with prompt caching

The LLM Gateway API is separate from the Speech-to-Text and Speech Understanding APIs. It provides a unified interface to work with large language models across multiple providers.

​Overview

​Quickstart

​Non-standard LLM Enhancements

​Regional compliance

​Global model routing

​Rate limits

​Logging and troubleshooting

​Next steps

Overview

Quickstart

Non-standard LLM Enhancements

Regional compliance

Global model routing

Rate limits

Logging and troubleshooting

Next steps