Specify Fallback Models

Overview

The Fallback feature lets you specify one or more backup models that the LLM Gateway will automatically try if your primary model fails. This ensures your application stays resilient without requiring complex retry logic on your end.

The LLM Gateway is available in both US and EU regions. Fallback behavior works the same way on both endpoints. See Cloud endpoints and data residency for more details.

Basic usage

To add a fallback, include a fallbacks array in your request. Each entry specifies an alternative model to use if the primary model is unavailable:

Python
JavaScript

import requests

headers = {
  "authorization": "<YOUR_API_KEY>"
}

response = requests.post(
    "https://llm-gateway.assemblyai.com/v1/chat/completions",
    headers=headers,
    json={
        "model": "kimi-k2.5",
        "messages": [
            {"role": "user", "content": "Tell me a fairy tale."}
        ],
        "fallbacks": [
            {"model": "claude-sonnet-4-6"}
        ]
    }
)

result = response.json()
print(result["choices"][0]["message"]["content"])

const response = await fetch(
  "https://llm-gateway.assemblyai.com/v1/chat/completions",
  {
    method: "POST",
    headers: {
      authorization: "<YOUR_API_KEY>",
      "content-type": "application/json",
    },
    body: JSON.stringify({
      model: "kimi-k2.5",
      messages: [{ role: "user", content: "Tell me a fairy tale." }],
      fallbacks: [
        { model: "claude-sonnet-4-6" }
      ],
    }),
  }
);

const result = await response.json();
console.log(result.choices[0].message.content);

If kimi-k2.5 fails, the LLM Gateway automatically retries the request using claude-sonnet-4-6.

You can chain up to two fallback models by setting fallback_config.depth to 2. The LLM Gateway tries each fallback in order until one succeeds.

Override fields per fallback

In the advanced case, you can override specific request fields for each fallback model. For example, you can change the messages or temperature for the fallback:

Python
JavaScript

import requests

headers = {
  "authorization": "<YOUR_API_KEY>"
}

response = requests.post(
    "https://llm-gateway.assemblyai.com/v1/chat/completions",
    headers=headers,
    json={
        "model": "kimi-k2.5",
        "messages": [
            {"role": "user", "content": "Tell me a fairy tale."}
        ],
        "temperature": 0.2,
        "fallbacks": [
            {
                "model": "claude-sonnet-4-6",
                "messages": [
                    {"role": "user", "content": "Tell me a fairy tale, but be very concise."}
                ],
                "temperature": 0.4
            }
        ]
    }
)

result = response.json()
print(result["choices"][0]["message"]["content"])

const response = await fetch(
  "https://llm-gateway.assemblyai.com/v1/chat/completions",
  {
    method: "POST",
    headers: {
      authorization: "<YOUR_API_KEY>",
      "content-type": "application/json",
    },
    body: JSON.stringify({
      model: "kimi-k2.5",
      messages: [{ role: "user", content: "Tell me a fairy tale." }],
      temperature: 0.2,
      fallbacks: [
        {
          model: "claude-sonnet-4-6",
          messages: [
            { role: "user", content: "Tell me a fairy tale, but be very concise." },
          ],
          temperature: 0.4,
        },
      ],
    }),
  }
);

const result = await response.json();
console.log(result.choices[0].message.content);

Any field you don’t override in the fallback inherits the value from the original request. Only the fields you explicitly set in the fallback object are changed.

Retry behavior

If no fallbacks are set, the API automatically retries the LLM request once after 500ms. This is because fallback_config.retry defaults to true, providing a zero-config way to handle transient failures. For more control over retries, set retry to false and implement your own exponential backoff:

{
  "model": "kimi-k2.5",
  "messages": [{"role": "user", "content": "Tell me a fairy tale."}],
  "fallback_config": {
    "retry": false
  }
}

Response behavior

When a fallback is used, the response looks exactly as if you had made the original request with the fallback model. The model field in the response reflects the fallback model that was used, and billing is charged only for that model.

API reference

Request parameters

Key	Type	Required?	Description
`model`	string	Yes	The primary model to use for completion. See Available models for supported values.
`messages`	array	Yes	An array of message objects representing the conversation history.
`fallbacks`	array	No	An array of fallback objects. Each object must include a `model` and can override any field available in the original request.
`fallback_config`	object	No	Configuration for fallback behavior.
`fallback_config.retry`	boolean	No	Whether to automatically retry the request once after 500ms on failure. Defaults to `true`.
`fallback_config.depth`	number	No	Max fallbacks to traverse. Default 1, max 2.

Fallback object

Each object in the fallbacks array must include a model and can override any field available in the original request. For example:

Key	Type	Required?	Description
`model`	string	Yes	The fallback model to use. See Available models for supported values.
`messages`	array	No	Override the messages for the fallback request.
`temperature`	number	No	Override the temperature for the fallback request.
`max_tokens`	number	No	Override the max tokens for the fallback request.

Any field from the original request can be included in a fallback object. Fields not specified in the fallback inherit the values from the original request.

Next steps

Basic chat completions - Send simple messages and receive responses
Multi-turn conversations - Maintain context across multiple exchanges
Tool calling - Enable models to execute custom functions
Cloud endpoints and data residency - Learn about regional endpoint options

Documentation Index

​Overview

​Basic usage

​Override fields per fallback

​Retry behavior

​Response behavior

​API reference

​Request parameters

​Fallback object

​Next steps

Overview

Basic usage

Override fields per fallback

Retry behavior

Response behavior

API reference

Request parameters

Fallback object

Next steps