LLM Gateway Overview

US & EU

Overview

AssemblyAI’s LLM Gateway is a unified interface that allows you to connect with multiple LLM providers including Claude, GPT, and Gemini. You can use the LLM Gateway to build sophisticated AI applications through a single API.

EndpointBase URL
US (default)https://llm-gateway.assemblyai.com/v1/chat/completions
EUhttps://llm-gateway.eu.assemblyai.com/v1/chat/completions

The LLM Gateway is available in both US and EU regions. Use the EU endpoint to ensure your data stays within the European Union. Currently, Anthropic Claude and Google Gemini models are supported in the EU. OpenAI models are only available in the US region. See Cloud Endpoints and Data Residency for more details.

The LLM Gateway provides access to 15+ models across major AI providers with support for:

  • Basic Chat Completions - Simple request/response interactions
  • Streamed Responses - Stream output as it’s generated (OpenAI models)
  • Multi-turn Conversations - Maintain context across multiple exchanges
  • Structured Outputs - Constrain responses to a specific JSON schema
  • Tool/Function Calling - Enable models to execute custom functions
  • Agentic Workflows - Multi-step reasoning with automatic tool chaining
  • Unified Interface - One API for Claude, GPT, Gemini, and more

Available models

By quality (LMArena Score)

ModelProviderParameterLMArena ScoreLatency per 10,000 tokens
Claude Opus 4.6Anthropicclaude-opus-4-61504TBD
Gemini 3 Pro PreviewGooglegemini-3-pro-preview148610.0s
Gemini 3 Flash PreviewGooglegemini-3-flash-preview1473TBD
Claude Opus 4.5Anthropicclaude-opus-4-5-202511011467TBD
Claude Sonnet 4.6Anthropicclaude-sonnet-4-61457TBD
Claude 4.5 SonnetAnthropicclaude-sonnet-4-5-20250929145010.1s
Gemini 2.5 ProGooglegemini-2.5-pro144913.9s
GPT-5.2OpenAIgpt-5.21437TBD
GPT-5.1OpenAIgpt-5.11437TBD
GPT-5OpenAIgpt-5142618.9s
Claude 4 OpusAnthropicclaude-opus-4-20250514141315.4s
GPT-4.1OpenAIgpt-4.1141312.6s
Gemini 2.5 FlashGooglegemini-2.5-flash14118.3s
Claude 4.5 HaikuAnthropicclaude-haiku-4-5-2025100114054.6s
Claude 4 SonnetAnthropicclaude-sonnet-4-2025051413907.1s
GPT-5 miniOpenAIgpt-5-mini139021.9s
Gemini 2.5 Flash-LiteGooglegemini-2.5-flash-lite13801.6s
gpt-oss-120bOpenAIgpt-oss-120b135410.5s
GPT-5 nanoOpenAIgpt-5-nano133811.2s
gpt-oss-20bOpenAIgpt-oss-20b13174.2s
Claude 3.0 HaikuAnthropicclaude-3-haiku-2024030712614.8s

By latency (per 10,000 tokens)

ModelProviderParameterLatency per 10,000 tokensLMArena Score
Gemini 2.5 Flash-LiteGooglegemini-2.5-flash-lite1.6s1380
gpt-oss-20bOpenAIgpt-oss-20b4.2s1317
Claude 4.5 HaikuAnthropicclaude-haiku-4-5-202510014.6s1405
Claude 3.0 HaikuAnthropicclaude-3-haiku-202403074.8s1261
Claude 4 SonnetAnthropicclaude-sonnet-4-202505147.1s1390
Gemini 2.5 FlashGooglegemini-2.5-flash8.3s1411
Gemini 3 Pro PreviewGooglegemini-3-pro-preview10.0s1486
Claude 4.5 SonnetAnthropicclaude-sonnet-4-5-2025092910.1s1450
gpt-oss-120bOpenAIgpt-oss-120b10.5s1354
GPT-5 nanoOpenAIgpt-5-nano11.2s1338
GPT-4.1OpenAIgpt-4.112.6s1413
Gemini 2.5 ProGooglegemini-2.5-pro13.9s1449
Claude 4 OpusAnthropicclaude-opus-4-2025051415.4s1413
GPT-5OpenAIgpt-518.9s1426
GPT-5 miniOpenAIgpt-5-mini21.9s1390
Claude Opus 4.6Anthropicclaude-opus-4-6TBD1504
Claude Sonnet 4.6Anthropicclaude-sonnet-4-6TBD1457
Claude Opus 4.5Anthropicclaude-opus-4-5-20251101TBD1467
GPT-5.2OpenAIgpt-5.2TBD1437
GPT-5.1OpenAIgpt-5.1TBD1437
Gemini 3 Flash PreviewGooglegemini-3-flash-previewTBD1473

By provider

Anthropic Claude

ModelParameterLMArena ScoreLatency per 10,000 tokens
Claude Opus 4.6claude-opus-4-61504TBD
Claude Sonnet 4.6claude-sonnet-4-61457TBD
Claude Opus 4.5claude-opus-4-5-202511011467TBD
Claude 4.5 Sonnetclaude-sonnet-4-5-20250929145010.1s
Claude 4.5 Haikuclaude-haiku-4-5-2025100114054.6s
Claude 4 Opusclaude-opus-4-20250514141315.4s
Claude 4 Sonnetclaude-sonnet-4-2025051413907.1s
Claude 3.0 Haikuclaude-3-haiku-2024030712614.8s

OpenAI GPT

ModelParameterLMArena ScoreLatency per 10,000 tokens
GPT-5.2gpt-5.21437TBD
GPT-5.1gpt-5.11437TBD
GPT-5gpt-5142618.9s
GPT-5 nanogpt-5-nano133811.2s
GPT-5 minigpt-5-mini139021.9s
GPT-4.1gpt-4.1141312.6s
gpt-oss-120bgpt-oss-120b135410.5s
gpt-oss-20bgpt-oss-20b13174.2s

Google Gemini

ModelParameterLMArena ScoreLatency per 10,000 tokens
Gemini 3 Pro Previewgemini-3-pro-preview148610.0s
Gemini 3 Flash Previewgemini-3-flash-preview1473TBD
Gemini 2.5 Progemini-2.5-pro144913.9s
Gemini 2.5 Flashgemini-2.5-flash14118.3s
Gemini 2.5 Flash-Litegemini-2.5-flash-lite13801.6s

Anthropic will retire Claude 3.0 Haiku (claude-3-haiku-20240307) on April 20, 2026. To ensure uninterrupted service, switch to Claude 4.5 Haiku (claude-haiku-4-5-20251001) before that date.

Claude Opus 4.5 and Claude Opus 4.6 currently support context windows under 200k tokens via the LLM Gateway.

For information on data retention and model training policies for each provider, see Data Retention and Model Training.

Head to our Playground to test out LLM Gateway without having to write any code!

Select a model

You can specify which model to use in your request by setting the model parameter. Here are examples showing how to use Claude 4.5 Sonnet:

1import requests
2
3headers = {
4 "authorization": "<YOUR_API_KEY>"
5}
6
7response = requests.post(
8 "https://llm-gateway.assemblyai.com/v1/chat/completions",
9 headers = headers,
10 json = {
11 "model": "claude-sonnet-4-5-20250929",
12 "messages": [
13 {"role": "user", "content": "What is the capital of France?"}
14 ],
15 "max_tokens": 1000
16 }
17)
18
19result = response.json()
20print(result["choices"][0]["message"]["content"])

Simply change the model parameter to use any of the available models listed in the Available models section above.

Want to compare models side-by-side? Try the Model Comparison Tool, a Lovable application, to test different LLM models and see how they perform.

Next steps

The LLM Gateway API is separate from the Speech-to-Text and Speech Understanding APIs. It provides a unified interface to work with large language models across multiple providers.