LLM Gateway Overview

Overview

AssemblyAI’s LLM Gateway is a unified interface that allows you to connect with multiple LLM providers including Claude, GPT, and Gemini. You can use the LLM Gateway to build sophisticated AI applications through a single API.

The LLM Gateway provides access to 15+ models across major AI providers with support for:

  • Basic Chat Completions - Simple request/response interactions
  • Multi-turn Conversations - Maintain context across multiple exchanges
  • Tool/Function Calling - Enable models to execute custom functions
  • Agentic Workflows - Multi-step reasoning with automatic tool chaining
  • Unified Interface - One API for Claude, GPT, Gemini, and more

Available models

Anthropic Claude

ModelParameterLatency per 10,000 tokensLMArena ScoreDescriptionRetention PolicyAnthropic Model Training
Claude 4.5 Sonnetclaude-sonnet-4-5-2025092910.1s1444Claude’s best model for complex agents and codingWe use this model through Amazon Bedrock. Amazon Bedrock doesn’t store or log your prompts and completions. Amazon Bedrock doesn’t use your prompts and completions to train any AWS models and doesn’t distribute them to third parties. See here for more information on Amazon Bedrock data protection policies.*AssemblyAI has opted out of model training with all LLM Gateway providers.
Claude 4 Sonnetclaude-sonnet-4-202505147.1s1389High-performance modelWe use this model through Amazon Bedrock. Amazon Bedrock doesn’t store or log your prompts and completions. Amazon Bedrock doesn’t use your prompts and completions to train any AWS models and doesn’t distribute them to third parties. See here for more information on Amazon Bedrock data protection policies.*AssemblyAI has opted out of model training with all LLM Gateway providers.
Claude 4 Opusclaude-opus-4-2025051415.4s1412Claude’s previous flagship modelWe use this model through Amazon Bedrock. Amazon Bedrock doesn’t store or log your prompts and completions. Amazon Bedrock doesn’t use your prompts and completions to train any AWS models and doesn’t distribute them to third parties. See here for more information on Amazon Bedrock data protection policies.*AssemblyAI has opted out of model training with all LLM Gateway providers.
Claude 4.5 Haikuclaude-haiku-4-5-202510014.6s1402Claude’s fastest and most intelligent Haiku modelWe use this model through Amazon Bedrock. Amazon Bedrock doesn’t store or log your prompts and completions. Amazon Bedrock doesn’t use your prompts and completions to train any AWS models and doesn’t distribute them to third parties. See here for more information on Amazon Bedrock data protection policies.*AssemblyAI has opted out of model training with all LLM Gateway providers.
Claude 3.5 Haikuclaude-3-5-haiku-202410225.4s1322Fast and efficient model with strong performanceWe use this model through Amazon Bedrock. Amazon Bedrock doesn’t store or log your prompts and completions. Amazon Bedrock doesn’t use your prompts and completions to train any AWS models and doesn’t distribute them to third parties. See here for more information on Amazon Bedrock data protection policies.*AssemblyAI has opted out of model training with all LLM Gateway providers.
Claude 3.0 Haikuclaude-3-haiku-202403074.8s1262Fast and compact model for near-instant responsivenessWe use this model through Amazon Bedrock. Amazon Bedrock doesn’t store or log your prompts and completions. Amazon Bedrock doesn’t use your prompts and completions to train any AWS models and doesn’t distribute them to third parties. See here for more information on Amazon Bedrock data protection policies.*AssemblyAI has opted out of model training with all LLM Gateway providers.

*If Amazon Bedrock fails, for non-EU customers we may send your request to the Anthropic API, where we have 0-day retention configured. Please see Anthropic’s commercial terms here.

OpenAI GPT

ModelParameterLatency per 10,000 tokensLMArena ScoreDescriptionRetention PolicyOpenAI Model Training
GPT-5gpt-518.9s1425OpenAI’s best model for coding and agentic tasks across domainsAbuse monitoring retains logs for 30 days. If you require ZDR, please use Anthropic or Google models.AssemblyAI has opted out of model training with all LLM Gateway providers.
GPT-5 nanogpt-5-nano11.2s1338OpenAI’s fastest, most cost-efficient version of GPT-5Abuse monitoring retains logs for 30 days. If you require ZDR, please use Anthropic or Google models.AssemblyAI has opted out of model training with all LLM Gateway providers.
GPT-5 minigpt-5-mini21.9s1393A faster, cost-efficient version of GPT-5 for well-defined tasksAbuse monitoring retains logs for 30 days. If you require ZDR, please use Anthropic or Google models.AssemblyAI has opted out of model training with all LLM Gateway providers.
GPT-4.1gpt-4.112.6s1412OpenAI’s smartest non-reasoning modelAbuse monitoring retains logs for 30 days. If you require ZDR, please use Anthropic or Google models.AssemblyAI has opted out of model training with all LLM Gateway providers.
gpt-oss-120bgpt-oss-120b10.5s1352OpenAI’s most powerful open-weight modelWe use this model through Amazon Bedrock. Amazon Bedrock doesn’t store or log your prompts and completions. Amazon Bedrock doesn’t use your prompts and completions to train any AWS models and doesn’t distribute them to third parties. See here for more information on Amazon Bedrock data protection policies.AssemblyAI has opted out of model training with all LLM Gateway providers.
gpt-oss-20bgpt-oss-20b4.2s1318Medium-sized open-weight model for low latencyWe use this model through Amazon Bedrock. Amazon Bedrock doesn’t store or log your prompts and completions. Amazon Bedrock doesn’t use your prompts and completions to train any AWS models and doesn’t distribute them to third parties. See here for more information on Amazon Bedrock data protection policies.AssemblyAI has opted out of model training with all LLM Gateway providers.

Google Gemini

ModelParameterLatency per 10,000 tokensLMArena ScoreDescriptionRetention PolicyGoogle Model Training
Gemini 3 Pro Previewgemini-3-pro-previewTBD1495Gemini’s most powerful agentic and vibe-coding model, delivering richer visuals and deeper interactivityZDR (see Google’s policy here for more information on how Google defines ZDR)AssemblyAI has opted out of model training with all LLM Gateway providers.
Gemini 2.5 Progemini-2.5-pro13.9s1451Gemini’s state-of-the-art thinking model, capable of reasoning over complex problemsZDR (see Google’s policy here for more information on how Google defines ZDR)AssemblyAI has opted out of model training with all LLM Gateway providers.
Gemini 2.5 Flashgemini-2.5-flash8.3s1407Gemini’s best model in terms of price-performance, offering well-rounded capabilitiesZDR (see Google’s policy here for more information on how Google defines ZDR)AssemblyAI has opted out of model training with all LLM Gateway providers.
Gemini 2.5 Flash-Litegemini-2.5-flash-lite1.6s1375Gemini’s fastest flash model optimized for cost-efficiency and high throughputZDR (see Google’s policy here for more information on how Google defines ZDR)AssemblyAI has opted out of model training with all LLM Gateway providers.

Unsure which model to choose?

  • Consider Claude models for nuanced reasoning and complex instructions
  • Consider GPT models for code generation and structured outputs
  • Consider Gemini models for cost-effective high-volume applications

Head to our Playground to test out LLM Gateway without having to write any code!

Select a model

You can specify which model to use in your request by setting the model parameter. Here are examples showing how to use Claude 4.5 Sonnet:

1import requests
2
3headers = {
4 "authorization": "<YOUR_API_KEY>"
5}
6
7response = requests.post(
8 "https://llm-gateway.assemblyai.com/v1/chat/completions",
9 headers = headers,
10 json = {
11 "model": "claude-sonnet-4-5-20250929",
12 "messages": [
13 {"role": "user", "content": "What is the capital of France?"}
14 ],
15 "max_tokens": 1000
16 }
17)
18
19result = response.json()
20print(result["choices"][0]["message"]["content"])

Simply change the model parameter to use any of the available models listed in the Available models section above.

Next steps

The LLM Gateway API is separate from the Speech-to-Text and Audio Intelligence APIs. It provides a unified interface to work with large language models across multiple providers.