llm field on the agent to your own OpenAI-compatible chat-completions endpoint. AssemblyAI calls that endpoint at runtime to generate every reply.
When to use this. Reach for a custom LLM when you need a specific model, your own fine-tune, or your own provider account and billing. If you just want a different frontier model without managing an endpoint, point
llm at the LLM Gateway instead.Connect a model
Add anllm array to a create or update request. Each entry needs a base_url, a model, and an api_key:
| Field | Type | Required | Notes |
|---|---|---|---|
base_url | string | Yes | HTTPS base URL of the OpenAI-compatible endpoint. Must be https and a public host. The agent calls POST {base_url}/chat/completions. |
model | string | Yes | Model name sent in the chat-completions request body. |
api_key | string | Yes | Key for your endpoint. Write-only — encrypted at rest and never returned in any response. |
Update or rotate the model
Send a newllm array on PUT /v1/agents/{id}. Include api_key to rotate the key; the whole llm entry is replaced:
"llm": [].
Use the LLM Gateway
You don’t need your own provider account to use a frontier model. Pointbase_url at AssemblyAI’s LLM Gateway and pass your AssemblyAI API key — you get Claude, GPT, Gemini, and more through one endpoint, billed on your AssemblyAI account:
https://llm-gateway.eu.assemblyai.com/v1 for EU workloads.
Requirements & behavior
- OpenAI-compatible. The endpoint must accept
POST /chat/completionsin the OpenAI schema. - Streaming. Realtime voice needs token streaming, so the model must support streamed chat completions.
- One config.
llmis a list, but only a single entry is accepted today (fallbacks aren’t supported yet). - HTTPS + public host. Non-
httpsURLs and private/loopback hosts are rejected. - Reads mask the key.
GET/list responses return onlybase_urlandmodel— neverapi_key.
Latency and reliability now depend on your endpoint. A slow or rate-limited model shows up directly as reply latency in the conversation. See Best practices for tuning.