Create a chat completion

<llms-only> > For the complete documentation index, see [llms.txt](https://www.assemblyai.com/docs/llms.txt) </llms-only> <Note>To use our EU server for LLM Gateway, replace `llm-gateway.assemblyai.com` with `llm-gateway.eu.assemblyai.com`.</Note> Generates a response from a model given a prompt or a series of messages.

Authentication

Authorizationstring
API Key authentication via header

Request

Request body for creating a chat completion.
modelstringRequired

The ID of the model to use for this request. See LLM Gateway Overview for available models.

messageslist of objectsOptional
A list of messages comprising the conversation so far.
promptstringOptional
A simple string prompt. The API will automatically convert this into a user message.
max_tokensintegerOptional>=1Defaults to 1000
The maximum number of tokens to generate in the completion. Default is 1000.
temperaturedoubleOptional
Controls randomness. Lower values produce more deterministic results.
streambooleanOptionalDefaults to false

When true, responses are streamed as server-sent events (SSE). Supported on OpenAI models only.

toolslist of objectsOptional
A list of tools the model may call.
tool_choiceenum or objectOptional

Controls which (if any) function is called by the model.

response_formatobjectOptional

Specifies the format of the model’s response. Use this to constrain the model to output valid JSON matching a schema. Supported by OpenAI (GPT-4.1, GPT-5.x), Gemini, and Claude models. Not supported by gpt-oss models.

fallbackslist of objectsOptional

An array of fallback objects. Each object must include a model and can optionally override any field from the original request. If the primary model fails, the LLM Gateway tries each fallback in order until one succeeds. See Specify fallback models for more details.

fallback_configobjectOptional

Configuration for fallback behavior, including retry and depth settings. See Specify fallback models for more details.

Response

Successful response containing the model's choices.
request_idstringformat: "uuid"
choiceslist of objects
requestobject

A copy of the original request, excluding prompt and messages.

usageobject
http_status_codeinteger
The HTTP status code of the response
response_timeinteger
The response time in nanoseconds
llm_status_codeinteger
The status code from the LLM provider