Create a chat completion
<Note>To use our EU server for LLM Gateway, replace `llm-gateway.assemblyai.com` with `llm-gateway.eu.assemblyai.com`.</Note>
Generates a response from a model given a prompt or a series of messages.
Authentication
Authorizationstring
API Key authentication via header
Request
Request body for creating a chat completion.
model
The ID of the model to use for this request. See LLM Gateway Overview for available models.
messages
A list of messages comprising the conversation so far.
prompt
A simple string prompt. The API will automatically convert this into a user message.
max_tokens
The maximum number of tokens to generate in the completion. Default is 1000.
temperature
Controls randomness. Lower values produce more deterministic results.
stream
When true, responses are streamed as server-sent events (SSE). Supported on OpenAI models only.
tools
A list of tools the model may call.
tool_choice
Controls which (if any) function is called by the model.
response_format
Specifies the format of the model’s response. Use this to constrain the model to output valid JSON matching a schema. Supported by OpenAI (GPT-4.1, GPT-5.x), Gemini, and Claude models. Not supported by gpt-oss models.
Response
Successful response containing the model's choices.
request_id
choices
request
A copy of the original request, excluding prompt and messages.
usage
http_status_code
The HTTP status code of the response
response_time
The response time in nanoseconds
llm_status_code
The status code from the LLM provider