LLM Gateway Overview

US & EU

Overview

AssemblyAI’s LLM Gateway is a unified interface that allows you to connect with multiple LLM providers including Claude, GPT, Gemini, and more. You can use the LLM Gateway to build sophisticated AI applications through a single API.

EndpointBase URL
US (default)https://llm-gateway.assemblyai.com/v1/chat/completions
EUhttps://llm-gateway.eu.assemblyai.com/v1/chat/completions

The LLM Gateway is available in both US and EU regions. Use the EU endpoint to ensure your data stays within the European Union. Currently, Anthropic Claude and Google Gemini models are supported in the EU. OpenAI models are only available in the US region. See Cloud Endpoints and Data Residency for more details.

The LLM Gateway provides access to 20+ models across major AI providers with support for:

  • Basic Chat Completions - Simple request/response interactions
  • Streamed Responses - Stream output as it’s generated (OpenAI models)
  • Multi-turn Conversations - Maintain context across multiple exchanges
  • Structured Outputs - Constrain responses to a specific JSON schema
  • Tool/Function Calling - Enable models to execute custom functions
  • Agentic Workflows - Multi-step reasoning with automatic tool chaining
  • Unified Interface - One API for Claude, GPT, Gemini, Qwen, Kimi, and more

Available models

By quality (LMArena Score)

ModelProviderParameterLMArena ScoreLatency per 10,000 tokens
Claude Opus 4.6Anthropicclaude-opus-4-615047.4s
Gemini 3 Flash PreviewGooglegemini-3-flash-preview14734.2s
Claude Opus 4.5Anthropicclaude-opus-4-5-2025110114673.9s
Claude Sonnet 4.6Anthropicclaude-sonnet-4-614577.2s
Claude 4.5 SonnetAnthropicclaude-sonnet-4-5-2025092914505.6s
Gemini 2.5 ProGooglegemini-2.5-pro14494.0s
GPT-5.2OpenAIgpt-5.214371.6s
GPT-5.1OpenAIgpt-5.114372.7s
Kimi K2.5Moonshot AIkimi-k2.514331.2s
GPT-5OpenAIgpt-514264.3s
Claude 4 OpusAnthropicclaude-opus-4-20250514141313.6s
GPT-4.1OpenAIgpt-4.114131.8s
Gemini 2.5 FlashGooglegemini-2.5-flash14112.6s
Claude 4.5 HaikuAnthropicclaude-haiku-4-5-2025100114054.1s
Qwen3 Next 80B A3BAlibaba Cloudqwen3-next-80b-a3b14013.1s
Claude 4 SonnetAnthropicclaude-sonnet-4-2025051413905.1s
GPT-5 miniOpenAIgpt-5-mini13903.8s
Gemini 2.5 Flash-LiteGooglegemini-2.5-flash-lite13801.1s
gpt-oss-120bOpenAIgpt-oss-120b13541.4s
Qwen3 32BAlibaba Cloudqwen3-32B13473.7s
GPT-5 nanoOpenAIgpt-5-nano13383.2s
gpt-oss-20bOpenAIgpt-oss-20b13171.1s
Claude 3.0 HaikuAnthropicclaude-3-haiku-2024030712613.1s

By latency (per 10,000 tokens)

ModelProviderParameterLatency per 10,000 tokensLMArena Score
Gemini 2.5 Flash-LiteGooglegemini-2.5-flash-lite1.1s1380
gpt-oss-20bOpenAIgpt-oss-20b1.1s1317
Kimi K2.5Moonshot AIkimi-k2.51.2s1433
gpt-oss-120bOpenAIgpt-oss-120b1.4s1354
GPT-5.2OpenAIgpt-5.21.6s1437
GPT-4.1OpenAIgpt-4.11.8s1413
Gemini 2.5 FlashGooglegemini-2.5-flash2.6s1411
GPT-5.1OpenAIgpt-5.12.7s1437
Claude 3.0 HaikuAnthropicclaude-3-haiku-202403073.1s1261
Qwen3 Next 80B A3BAlibaba Cloudqwen3-next-80b-a3b3.1s1401
GPT-5 nanoOpenAIgpt-5-nano3.2s1338
Qwen3 32BAlibaba Cloudqwen3-32B3.7s1347
GPT-5 miniOpenAIgpt-5-mini3.8s1390
Claude Opus 4.5Anthropicclaude-opus-4-5-202511013.9s1467
Gemini 2.5 ProGooglegemini-2.5-pro4.0s1449
Claude 4.5 HaikuAnthropicclaude-haiku-4-5-202510014.1s1405
Gemini 3 Flash PreviewGooglegemini-3-flash-preview4.2s1473
GPT-5OpenAIgpt-54.3s1426
Claude 4 SonnetAnthropicclaude-sonnet-4-202505145.1s1390
Claude 4.5 SonnetAnthropicclaude-sonnet-4-5-202509295.6s1450
Claude Sonnet 4.6Anthropicclaude-sonnet-4-67.2s1457
Claude Opus 4.6Anthropicclaude-opus-4-67.4s1504
Claude 4 OpusAnthropicclaude-opus-4-2025051413.6s1413

By provider

Anthropic Claude

ModelParameterLMArena ScoreLatency per 10,000 tokens
Claude Opus 4.6claude-opus-4-615047.4s
Claude Sonnet 4.6claude-sonnet-4-614577.2s
Claude Opus 4.5claude-opus-4-5-2025110114673.9s
Claude 4.5 Sonnetclaude-sonnet-4-5-2025092914505.6s
Claude 4.5 Haikuclaude-haiku-4-5-2025100114054.1s
Claude 4 Opusclaude-opus-4-20250514141313.6s
Claude 4 Sonnetclaude-sonnet-4-2025051413905.1s
Claude 3.0 Haikuclaude-3-haiku-2024030712613.1s

OpenAI GPT

ModelParameterLMArena ScoreLatency per 10,000 tokens
GPT-5.2gpt-5.214371.6s
GPT-5.1gpt-5.114372.7s
GPT-5gpt-514264.3s
GPT-5 nanogpt-5-nano13383.2s
GPT-5 minigpt-5-mini13903.8s
GPT-4.1gpt-4.114131.8s
gpt-oss-120bgpt-oss-120b13541.4s
gpt-oss-20bgpt-oss-20b13171.1s

Google Gemini

ModelParameterLMArena ScoreLatency per 10,000 tokens
Gemini 3 Flash Previewgemini-3-flash-preview14734.2s
Gemini 2.5 Progemini-2.5-pro14494.0s
Gemini 2.5 Flashgemini-2.5-flash14112.6s
Gemini 2.5 Flash-Litegemini-2.5-flash-lite13801.1s

Alibaba Cloud Qwen

ModelParameterLMArena ScoreLatency per 10,000 tokens
Qwen3 Next 80B A3Bqwen3-next-80b-a3b14013.1s
Qwen3 32Bqwen3-32B13473.7s

Moonshot AI Kimi

ModelParameterLMArena ScoreLatency per 10,000 tokens
Kimi K2.5kimi-k2.514331.2s

Anthropic will retire Claude 3.0 Haiku (claude-3-haiku-20240307) on April 20, 2026. To ensure uninterrupted service, switch to Claude 4.5 Haiku (claude-haiku-4-5-20251001) before that date.

Claude Opus 4.5 and Claude Opus 4.6 currently support context windows under 200k tokens via the LLM Gateway.

For information on data retention and model training policies for each provider, see Data Retention and Model Training.

Head to our Playground to test out LLM Gateway without having to write any code!

Select a model

You can specify which model to use in your request by setting the model parameter. Here are examples showing how to use Claude 4.5 Sonnet:

1import requests
2
3headers = {
4 "authorization": "<YOUR_API_KEY>"
5}
6
7response = requests.post(
8 "https://llm-gateway.assemblyai.com/v1/chat/completions",
9 headers = headers,
10 json = {
11 "model": "claude-sonnet-4-5-20250929",
12 "messages": [
13 {"role": "user", "content": "What is the capital of France?"}
14 ],
15 "max_tokens": 1000
16 }
17)
18
19result = response.json()
20print(result["choices"][0]["message"]["content"])

Simply change the model parameter to use any of the available models listed in the Available models section above.

Want to compare models side-by-side? Try the Model Comparison Tool, a Lovable application, to test different LLM models and see how they perform.

Next steps

The LLM Gateway API is separate from the Speech-to-Text and Speech Understanding APIs. It provides a unified interface to work with large language models across multiple providers.