Specify Fallback Models

Overview

The Fallback feature lets you specify one or more backup models that the LLM Gateway will automatically try if your primary model fails. This ensures your application stays resilient without requiring complex retry logic on your end.

The LLM Gateway is available in both US and EU regions. Fallback behavior works the same way on both endpoints. See Cloud endpoints and data residency for more details.

Basic usage

To add a fallback, include a fallbacks array in your request. Each entry specifies an alternative model to use if the primary model is unavailable:

1import requests
2
3headers = {
4 "authorization": "<YOUR_API_KEY>"
5}
6
7response = requests.post(
8 "https://llm-gateway.assemblyai.com/v1/chat/completions",
9 headers=headers,
10 json={
11 "model": "kimi-k2.5",
12 "messages": [
13 {"role": "user", "content": "Tell me a fairy tale."}
14 ],
15 "fallbacks": [
16 {"model": "claude-sonnet-4-6"}
17 ]
18 }
19)
20
21result = response.json()
22print(result["choices"][0]["message"]["content"])

If kimi-k2.5 fails, the LLM Gateway automatically retries the request using claude-sonnet-4-6.

You can chain up to two fallback models by setting fallback_config.depth to 2. The LLM Gateway tries each fallback in order until one succeeds.

Override fields per fallback

In the advanced case, you can override specific request fields for each fallback model. For example, you can change the messages or temperature for the fallback:

1import requests
2
3headers = {
4 "authorization": "<YOUR_API_KEY>"
5}
6
7response = requests.post(
8 "https://llm-gateway.assemblyai.com/v1/chat/completions",
9 headers=headers,
10 json={
11 "model": "kimi-k2.5",
12 "messages": [
13 {"role": "user", "content": "Tell me a fairy tale."}
14 ],
15 "temperature": 0.2,
16 "fallbacks": [
17 {
18 "model": "claude-sonnet-4-6",
19 "messages": [
20 {"role": "user", "content": "Tell me a fairy tale, but be very concise."}
21 ],
22 "temperature": 0.4
23 }
24 ]
25 }
26)
27
28result = response.json()
29print(result["choices"][0]["message"]["content"])

Any field you don’t override in the fallback inherits the value from the original request. Only the fields you explicitly set in the fallback object are changed.

Retry behavior

If no fallbacks are set, the API automatically retries the LLM request once after 500ms. This is because fallback_config.retry defaults to true, providing a zero-config way to handle transient failures.

For more control over retries, set retry to false and implement your own exponential backoff:

1{
2 "model": "kimi-k2.5",
3 "messages": [{"role": "user", "content": "Tell me a fairy tale."}],
4 "fallback_config": {
5 "retry": false
6 }
7}

Response behavior

When a fallback is used, the response looks exactly as if you had made the original request with the fallback model. The model field in the response reflects the fallback model that was used, and billing is charged only for that model.

API reference

Request parameters

KeyTypeRequired?Description
modelstringYesThe primary model to use for completion. See Available models for supported values.
messagesarrayYesAn array of message objects representing the conversation history.
fallbacksarrayNoAn array of fallback objects. Each object must include a model and can override any field available in the original request.
fallback_configobjectNoConfiguration for fallback behavior.
fallback_config.retrybooleanNoWhether to automatically retry the request once after 500ms on failure. Defaults to true.
fallback_config.depthnumberNoMax fallbacks to traverse. Default 1, max 2.

Fallback object

Each object in the fallbacks array must include a model and can override any field available in the original request. For example:

KeyTypeRequired?Description
modelstringYesThe fallback model to use. See Available models for supported values.
messagesarrayNoOverride the messages for the fallback request.
temperaturenumberNoOverride the temperature for the fallback request.
max_tokensnumberNoOverride the max tokens for the fallback request.

Any field from the original request can be included in a fallback object. Fields not specified in the fallback inherit the values from the original request.

Next steps