Voice Agent API

Session configuration

Set the system prompt, greeting, and turn detection behavior for your voice agent.

Send a session.update as your first WebSocket message — and any time after — to control how the agent speaks, listens, and responds. Here’s a typical configuration:

1{
2 "type": "session.update",
3 "session": {
4 "system_prompt": "You are a concise support agent. Max 2 sentences per turn.",
5 "greeting": "Hi! How can I help you today?",
6 "output": { "voice": "claire" },
7 "input": {
8 "turn_detection": { "type": "server_vad", "vad_threshold": 0.5 }
9 }
10 }
11}

Every field is optional — include only what you want to set or change. Jump to any section below for details.

System prompt

Set the agent’s personality and behavior. Can be updated mid-session with another session.update.

1{
2 "type": "session.update",
3 "session": {
4 "system_prompt": "You are a friendly support agent. Keep responses under 2 sentences. Never make up information."
5 }
6}

Tips for voice-first prompts:

  • Ban specific phrases: "Never say 'Certainly' or 'Absolutely'"
  • Enforce brevity: "Max 2 sentences per turn"
  • Tell the agent when to use each tool

Greeting

What the agent says at the start of the conversation, spoken aloud. If omitted, the agent waits silently for the user to speak first.

1{
2 "type": "session.update",
3 "session": {
4 "system_prompt": "You are a helpful assistant.",
5 "greeting": "Hi there! How can I help you today?"
6 }
7}

Voice and audio format

Choose a voice and configure the input/output audio format under session.output and session.input. Only PCM16 at 24 kHz is supported today, so the format blocks are optional.

1{
2 "type": "session.update",
3 "session": {
4 "input": {
5 "format": { "encoding": "audio/pcm", "sample_rate": 24000 }
6 },
7 "output": {
8 "voice": "claire",
9 "format": { "encoding": "audio/pcm", "sample_rate": 24000 }
10 }
11 }
12}

See Voices for the voice catalog and Audio format for playback details.

Turn detection

Customize turn detection sensitivity and barge-in behavior under session.input.turn_detection. All fields are optional — only include the ones you want to change. Settings can be updated mid-session.

1{
2 "type": "session.update",
3 "session": {
4 "input": {
5 "turn_detection": {
6 "type": "server_vad",
7 "vad_threshold": 0.5
8 }
9 }
10 }
11}
FieldTypeDefaultDescription
typestringserver_vadTurn detection algorithm. Currently only server_vad is supported.
vad_thresholdfloat0.5Turn detection sensitivity (0.0–1.0). Lower = more sensitive to speech.