LLM Gateway Overview

Overview

AssemblyAI’s LLM Gateway is a unified interface that allows you to connect with multiple LLM providers including Claude, GPT, and Gemini. You can use the LLM Gateway to build sophisticated AI applications through a single API.

The LLM Gateway provides access to 15+ models across major AI providers with support for:

  • Basic Chat Completions - Simple request/response interactions
  • Multi-turn Conversations - Maintain context across multiple exchanges
  • Tool/Function Calling - Enable models to execute custom functions
  • Agentic Workflows - Multi-step reasoning with automatic tool chaining
  • Unified Interface - One API for Claude, GPT, Gemini, and more

Available models

Anthropic Claude

ModelParameterLatency per 10,000 tokensLMArena ScoreDescription
Claude 4.5 Sonnetclaude-sonnet-4-5-2025092910.1s1438Claude’s best model for complex agents and coding
Claude 4 Sonnetclaude-sonnet-4-202505147.1s1389High-performance model
Claude 4 Opusclaude-opus-4-2025051415.4s1411Claude’s previous flagship model
Claude 4.5 Haikuclaude-haiku-4-5-202510014.6s1397Claude’s fastest and most intelligent Haiku model
Claude 3.5 Haikuclaude-3-5-haiku-202410225.4s1320Fast and efficient model with strong performance
Claude 3.0 Haikuclaude-3-haiku-202403074.8s1260Fast and compact model for near-instant responsiveness

OpenAI GPT

ModelParameterLatency per 10,000 tokensLMArena ScoreDescription
GPT-5gpt-518.9s1425OpenAI’s best model for coding and agentic tasks across domains
GPT-5 nanogpt-5-nano11.2s1337OpenAI’s fastest, most cost-efficient version of GPT-5
GPT-5 minigpt-5-mini21.9s1395A faster, cost-efficient version of GPT-5 for well-defined tasks
GPT-4.1gpt-4.112.6s1411OpenAI’s smartest non-reasoning model
ChatGPT-4ochatgpt-4o-latest8.0s1440GPT-4o model used in ChatGPT
gpt-oss-120bgpt-oss-120b10.5s1348OpenAI’s most powerful open-weight model
gpt-oss-20bgpt-oss-20b4.2s1317Medium-sized open-weight model for low latency

Google Gemini

ModelParameterLatency per 10,000 tokensLMArena ScoreDescription
Gemini 2.5 Progemini-2.5-pro13.9s1451Gemini’s state-of-the-art thinking model, capable of reasoning over complex problems
Gemini 2.5 Flashgemini-2.5-flash8.3s1404Gemini’s best model in terms of price-performance, offering well-rounded capabilities
Gemini 2.5 Flash-Litegemini-2.5-flash-lite1.6s1374Gemini’s fastest flash model optimized for cost-efficiency and high throughput

Unsure which model to choose?

  • Consider Claude models for nuanced reasoning and complex instructions
  • Consider GPT models for code generation and structured outputs
  • Consider Gemini models for cost-effective high-volume applications

Head to our Playground to test out LLM Gateway without having to write any code!

Select a model

You can specify which model to use in your request by setting the model parameter. Here are examples showing how to use Claude 4.5 Sonnet:

1import requests
2
3headers = {
4 "authorization": "<YOUR_API_KEY>"
5}
6
7response = requests.post(
8 "https://llm-gateway.assemblyai.com/v1/chat/completions",
9 headers = headers,
10 json = {
11 "model": "claude-sonnet-4-5-20250929",
12 "messages": [
13 {"role": "user", "content": "What is the capital of France?"}
14 ],
15 "max_tokens": 1000
16 }
17)
18
19result = response.json()
20print(result["choices"][0]["message"]["content"])

Simply change the model parameter to use any of the available models listed in the Available models section above.

Next steps

The LLM Gateway API is separate from the Speech-to-Text and Audio Intelligence APIs. It provides a unified interface to work with large language models across multiple providers.