Prompting guide
Set your agent’s system_prompt via session.update. The patterns below are tested against real voice agent conversations and consistently improve quality.
Copy this page into Claude, ChatGPT, or your preferred LLM and use it as a reference while iterating on your prompt. Having the LLM apply these patterns to your specific use case is the fastest way to get a good system prompt.
Make instructions stick
Front-load your most important rule
Put your most critical instruction at the top and reinforce it. Long prompts dilute attention. If you bury a key rule in the middle, the model deprioritizes it.
Use negative instructions with exact phrasings
Listing the exact phrases you don’t want works better than vague positive instructions like “be casual”. The model pattern-matches against concrete strings.
Pair bad examples with good ones
Show what you don’t want next to what you do want. The contrast teaches the rule.
Add self-check heuristics
Give the model a check it can run before responding. Abstract rules like “be brief” don’t land. A concrete heuristic does.
Match example length to desired output length
If your prompt examples are paragraphs, the model outputs paragraphs. Keep example outputs as terse as the real responses you want.
Iterate against real transcripts
Fix specific failures from real conversations. Quote the failing output and show the correction:
Speculative rules (“what if the user asks about X”) add noise without improving quality. Iterate against logs, not imagination.
Sound human
Give the agent an identity
Identity statements shape tone better than behavioral lists. Tell the model who it is, not just what to do.
Use permission language
Safety training makes models default to formal, cautious responses. Explicit permissions unlock natural behavior that “be friendly” never will.
Mirror the user’s length and energy
The model’s default is to talk more than the user. Instruct it to match.
Define engagement modes
A single behavioral playbook produces the same response shape regardless of context. Define separate modes so the agent reads the room.
Ban bot tells
List the specific phrases that make agents sound like chatbots and ban them.
Ground temporal and situational context
Inject session-specific information into the prompt at runtime. Users ask “what time is it?” and the model should not hallucinate.
Define capabilities explicitly
Without clear boundaries, the model invents capabilities to please users or denies things it can actually do.
List what the agent can and cannot do
Pin verified facts
List every factual claim the model is allowed to make. For anything else, direct users to documentation or a human.
Optimize for voice output
TTS engines read formatting characters literally. Formatting that works in chat sounds broken when spoken aloud.
Tell the model why formatting rules exist
Give a concrete example of the failure mode so it understands the constraint.
Spell out how to read literals
URLs, field names, and code identifiers need explicit substitution rules.
Round numbers and times
Voice users don’t need precision. Use natural approximations.
General prompt hygiene
Write policies, not decision trees
A policy is a general rule the model applies across situations. A decision tree is a set of fragile conditionals the model will misinterpret.
Trim aggressively when adding new sections
Long prompts drown their best rules. Every time you add a new instruction, look for something to remove.
Test by reading aloud
Read the agent’s outputs out loud. Visual scanning misses rhythm problems and unnatural phrasing that users notice immediately on a phone call.
Putting it together
A well-structured voice agent prompt typically follows this order:
- Identity and most important rule
- Tone and conversational style (permissions, mirroring, bot-tell bans)
- Capabilities and facts (what it can and cannot do, pinned facts)
- Tool usage instructions (when to call each tool, what to say while waiting)
- Voice formatting rules (no markdown, reading literals, rounding)
- Engagement modes (how to behave in different conversational contexts)
Your prompt works together with other session configuration. Use keyterms to boost recognition of domain-specific words, and configure turn detection thresholds to match the conversational pace your prompt encourages.