Session configuration
Send a session.update as your first WebSocket message — and any time after — to control how the agent speaks, listens, and responds. Here’s a typical configuration:
Every field is optional — include only what you want to set or change. Jump to any section below for details.
System prompt
Set the agent’s personality and behavior. Can be updated mid-session with another session.update.
Tips for voice-first prompts:
- Ban specific phrases:
"Never say 'Certainly' or 'Absolutely'" - Enforce brevity:
"Max 2 sentences per turn" - Tell the agent when to use each tool
Greeting
What the agent says at the start of the conversation, spoken aloud. If omitted, the agent waits silently for the user to speak first.
Voice and audio format
Choose a voice and configure the input/output audio format under session.output and session.input. Only PCM16 at 24 kHz is supported today, so the format blocks are optional.
See Voices for the voice catalog and Audio format for playback details.
Turn detection
Customize turn detection sensitivity and barge-in behavior under session.input.turn_detection. All fields are optional — only include the ones you want to change. Settings can be updated mid-session.