Basic Chat Completions
Overview
Basic chat completions allow you to send a message and receive a response from the model. This is the simplest way to interact with the LLM Gateway.
Getting started
Send a message and receive a response:
Python
JavaScript
Streamed responses
You can stream responses from OpenAI models by setting stream to true. This returns partial responses as server-sent events (SSE), allowing you to display output as it’s generated.
Streamed responses are currently supported on OpenAI models only.
Python
JavaScript
API reference
Request
The LLM Gateway accepts POST requests to https://llm-gateway.assemblyai.com/v1/chat/completions with the following parameters:
Request parameters
Message object
Content part object
Response
The API returns a JSON response with the model’s completion:
Response fields
Inject a transcript by ID
Pass transcript_id at the top level of the request to inject a transcript’s text into the prompt. The API replaces the first occurrence of the literal tag {{ transcript }} in the first message containing it with the transcript’s text field, then runs the completion.
Only the first occurrence of {{ transcript }} in the first message that contains it is substituted — additional tags or tags in later messages are left as-is. The tag must be exactly {{ transcript }} (with the spaces); variants like {{transcript}} or {{ TRANSCRIPT }} are not substituted. The endpoint returns 404 if the transcript ID does not exist or belongs to a different account.
Error response
If an error occurs, the API returns an error response: