LLM Gateway Overview | AssemblyAI

Overview

AssemblyAI’s LLM Gateway is a unified interface that allows you to connect with multiple LLM providers including Claude, GPT, and Gemini. You can use the LLM Gateway to build sophisticated AI applications through a single API.

The LLM Gateway provides access to 15+ models across major AI providers with support for:

Basic Chat Completions - Simple request/response interactions
Multi-turn Conversations - Maintain context across multiple exchanges
Tool/Function Calling - Enable models to execute custom functions
Agentic Workflows - Multi-step reasoning with automatic tool chaining
Unified Interface - One API for Claude, GPT, Gemini, and more

Available models

Anthropic Claude

Model	Parameter	Latency per 10,000 tokens	LMArena Score	Description	Retention Policy	Anthropic Model Training
Claude 4.5 Sonnet	`claude-sonnet-4-5-20250929`	10.1s	1444	Claude’s best model for complex agents and coding	We use this model through Amazon Bedrock. Amazon Bedrock doesn’t store or log your prompts and completions. Amazon Bedrock doesn’t use your prompts and completions to train any AWS models and doesn’t distribute them to third parties. See here for more information on Amazon Bedrock data protection policies.*	AssemblyAI has opted out of model training with all LLM Gateway providers.
Claude 4 Sonnet	`claude-sonnet-4-20250514`	7.1s	1389	High-performance model	We use this model through Amazon Bedrock. Amazon Bedrock doesn’t store or log your prompts and completions. Amazon Bedrock doesn’t use your prompts and completions to train any AWS models and doesn’t distribute them to third parties. See here for more information on Amazon Bedrock data protection policies.*	AssemblyAI has opted out of model training with all LLM Gateway providers.
Claude 4 Opus	`claude-opus-4-20250514`	15.4s	1412	Claude’s previous flagship model	We use this model through Amazon Bedrock. Amazon Bedrock doesn’t store or log your prompts and completions. Amazon Bedrock doesn’t use your prompts and completions to train any AWS models and doesn’t distribute them to third parties. See here for more information on Amazon Bedrock data protection policies.*	AssemblyAI has opted out of model training with all LLM Gateway providers.
Claude 4.5 Haiku	`claude-haiku-4-5-20251001`	4.6s	1402	Claude’s fastest and most intelligent Haiku model	We use this model through Amazon Bedrock. Amazon Bedrock doesn’t store or log your prompts and completions. Amazon Bedrock doesn’t use your prompts and completions to train any AWS models and doesn’t distribute them to third parties. See here for more information on Amazon Bedrock data protection policies.*	AssemblyAI has opted out of model training with all LLM Gateway providers.
Claude 3.5 Haiku	`claude-3-5-haiku-20241022`	5.4s	1322	Fast and efficient model with strong performance	We use this model through Amazon Bedrock. Amazon Bedrock doesn’t store or log your prompts and completions. Amazon Bedrock doesn’t use your prompts and completions to train any AWS models and doesn’t distribute them to third parties. See here for more information on Amazon Bedrock data protection policies.*	AssemblyAI has opted out of model training with all LLM Gateway providers.
Claude 3.0 Haiku	`claude-3-haiku-20240307`	4.8s	1262	Fast and compact model for near-instant responsiveness	We use this model through Amazon Bedrock. Amazon Bedrock doesn’t store or log your prompts and completions. Amazon Bedrock doesn’t use your prompts and completions to train any AWS models and doesn’t distribute them to third parties. See here for more information on Amazon Bedrock data protection policies.*	AssemblyAI has opted out of model training with all LLM Gateway providers.

*If Amazon Bedrock fails, for non-EU customers we may send your request to the Anthropic API, where we have 0-day retention configured. Please see Anthropic’s commercial terms here.

OpenAI GPT

Model	Parameter	Latency per 10,000 tokens	LMArena Score	Description	Retention Policy	OpenAI Model Training
GPT-5	`gpt-5`	18.9s	1425	OpenAI’s best model for coding and agentic tasks across domains	Abuse monitoring retains logs for 30 days. If you require ZDR, please use Anthropic or Google models.	AssemblyAI has opted out of model training with all LLM Gateway providers.
GPT-5 nano	`gpt-5-nano`	11.2s	1338	OpenAI’s fastest, most cost-efficient version of GPT-5	Abuse monitoring retains logs for 30 days. If you require ZDR, please use Anthropic or Google models.	AssemblyAI has opted out of model training with all LLM Gateway providers.
GPT-5 mini	`gpt-5-mini`	21.9s	1393	A faster, cost-efficient version of GPT-5 for well-defined tasks	Abuse monitoring retains logs for 30 days. If you require ZDR, please use Anthropic or Google models.	AssemblyAI has opted out of model training with all LLM Gateway providers.
GPT-4.1	`gpt-4.1`	12.6s	1412	OpenAI’s smartest non-reasoning model	Abuse monitoring retains logs for 30 days. If you require ZDR, please use Anthropic or Google models.	AssemblyAI has opted out of model training with all LLM Gateway providers.
gpt-oss-120b	`gpt-oss-120b`	10.5s	1352	OpenAI’s most powerful open-weight model	We use this model through Amazon Bedrock. Amazon Bedrock doesn’t store or log your prompts and completions. Amazon Bedrock doesn’t use your prompts and completions to train any AWS models and doesn’t distribute them to third parties. See here for more information on Amazon Bedrock data protection policies.	AssemblyAI has opted out of model training with all LLM Gateway providers.
gpt-oss-20b	`gpt-oss-20b`	4.2s	1318	Medium-sized open-weight model for low latency	We use this model through Amazon Bedrock. Amazon Bedrock doesn’t store or log your prompts and completions. Amazon Bedrock doesn’t use your prompts and completions to train any AWS models and doesn’t distribute them to third parties. See here for more information on Amazon Bedrock data protection policies.	AssemblyAI has opted out of model training with all LLM Gateway providers.

Google Gemini

Model	Parameter	Latency per 10,000 tokens	LMArena Score	Description	Retention Policy	Google Model Training
Gemini 3 Pro Preview	`gemini-3-pro-preview`	TBD	1495	Gemini’s most powerful agentic and vibe-coding model, delivering richer visuals and deeper interactivity	ZDR (see Google’s policy here for more information on how Google defines ZDR)	AssemblyAI has opted out of model training with all LLM Gateway providers.
Gemini 2.5 Pro	`gemini-2.5-pro`	13.9s	1451	Gemini’s state-of-the-art thinking model, capable of reasoning over complex problems	ZDR (see Google’s policy here for more information on how Google defines ZDR)	AssemblyAI has opted out of model training with all LLM Gateway providers.
Gemini 2.5 Flash	`gemini-2.5-flash`	8.3s	1407	Gemini’s best model in terms of price-performance, offering well-rounded capabilities	ZDR (see Google’s policy here for more information on how Google defines ZDR)	AssemblyAI has opted out of model training with all LLM Gateway providers.
Gemini 2.5 Flash-Lite	`gemini-2.5-flash-lite`	1.6s	1375	Gemini’s fastest flash model optimized for cost-efficiency and high throughput	ZDR (see Google’s policy here for more information on how Google defines ZDR)	AssemblyAI has opted out of model training with all LLM Gateway providers.

Unsure which model to choose?

Consider Claude models for nuanced reasoning and complex instructions
Consider GPT models for code generation and structured outputs
Consider Gemini models for cost-effective high-volume applications

Head to our Playground to test out LLM Gateway without having to write any code!

Select a model

You can specify which model to use in your request by setting the model parameter. Here are examples showing how to use Claude 4.5 Sonnet:

Python

JavaScript

1 import requests
2 
3 headers = {
4   "authorization": "<YOUR_API_KEY>"
5 }
6 
7 response = requests.post(
8     "https://llm-gateway.assemblyai.com/v1/chat/completions",
9     headers = headers,
10     json = {
11         "model": "claude-sonnet-4-5-20250929",
12         "messages": [
13             {"role": "user", "content": "What is the capital of France?"}
14         ],
15         "max_tokens": 1000
16     }
17 )
18 
19 result = response.json()
20 print(result["choices"][0]["message"]["content"])

Simply change the model parameter to use any of the available models listed in the Available models section above.

Next steps

Basic Chat Completions - Learn how to send simple messages and receive responses
Multi-turn Conversations - Maintain context across multiple exchanges
Tool Calling - Enable models to execute custom functions
Agentic Workflows - Build multi-step reasoning applications

The LLM Gateway API is separate from the Speech-to-Text and Audio Intelligence APIs. It provides a unified interface to work with large language models across multiple providers.