LLM Gateway Overview

Overview

AssemblyAI’s LLM Gateway is a unified interface that allows you to connect with multiple LLM providers including Claude, GPT, and Gemini. You can use the LLM Gateway to build sophisticated AI applications through a single API.

The LLM Gateway provides access to 15+ models across major AI providers with support for:

  • Basic Chat Completions - Simple request/response interactions
  • Multi-turn Conversations - Maintain context across multiple exchanges
  • Tool/Function Calling - Enable models to execute custom functions
  • Agentic Workflows - Multi-step reasoning with automatic tool chaining
  • Unified Interface - One API for Claude, GPT, Gemini, and more

Available models

Anthropic Claude

ModelParameterDescription
Claude 4.5 Sonnetclaude-sonnet-4-5-20250929Claude’s best model for complex agents and coding
Claude 4 Sonnetclaude-sonnet-4-20250514High-performance model
Claude 4 Opusclaude-opus-4-20250514Claude’s previous flagship model
Claude 4.5 Haikuclaude-haiku-4-5-20251001Claude’s fastest and most intelligent Haiku model
Claude 3.5 Haikuclaude-3-5-haiku-20241022Claude’s fastest model
Claude 3.0 Haikuclaude-3-haiku-20240307Fast and compact model for near-instant responsiveness

OpenAI GPT

ModelParameterDescription
GPT-5gpt-5OpenAI’s best model for coding and agentic tasks across domains
GPT-5 nanogpt-5-nanoOpenAI’s fastest, most cost-efficient version of GPT-5
GPT-5 minigpt-5-miniA faster, cost-efficient version of GPT-5 for well-defined tasks
GPT-4.1gpt-4.1OpenAI’s smartest non-reasoning model
ChatGPT-4ochatgpt-4o-latestGPT-4o model used in ChatGPT
gpt-oss-120bgpt-oss-120bOpenAI’s most powerful open-weight model
gpt-oss-20bgpt-oss-20bMedium-sized open-weight model for low latency

Google Gemini

ModelParameterDescription
Gemini 2.5 Progemini-2.5-proGemini’s state-of-the-art thinking model, capable of reasoning over complex problems
Gemini 2.5 Flashgemini-2.5-flashGemini’s best model in terms of price-performance, offering well-rounded capabilities
Gemini 2.5 Flash-Litegemini-2.5-flash-liteGemini’s fastest flash model optimized for cost-efficiency and high throughput

Unsure which model to choose?

  • Consider Claude models for nuanced reasoning and complex instructions
  • Consider GPT models for code generation and structured outputs
  • Consider Gemini models for cost-effective high-volume applications

Head to our Playground to test out LLM Gateway without having to write any code!

Select a model

You can specify which model to use in your request by setting the model parameter. Here are examples showing how to use Claude 4.5 Sonnet:

1import requests
2
3headers = {
4 "authorization": "<YOUR_API_KEY>"
5}
6
7response = requests.post(
8 "https://llm-gateway.assemblyai.com/v1/chat/completions",
9 headers = headers,
10 json = {
11 "model": "claude-sonnet-4-5-20250929",
12 "messages": [
13 {"role": "user", "content": "What is the capital of France?"}
14 ],
15 "max_tokens": 1000
16 }
17)
18
19result = response.json()
20print(result["choices"][0]["message"]["content"])

Simply change the model parameter to use any of the available models listed in the Available models section above.

Next steps

The LLM Gateway API is separate from the Speech-to-Text and Audio Intelligence APIs. It provides a unified interface to work with large language models across multiple providers.