Voice Agent API

Voices

Pick a voice that matches your agent's personality and language.

Pick any voice ID from the tables below and set it on session.output.voice in a session.update before session.ready. session.output is immutable once the session is established, so the voice canโ€™t be changed mid-conversation.

1{
2 "type": "session.update",
3 "session": {
4 "output": { "voice": "ivy" }
5 }
6}

Language support

The voice agentโ€™s input (speech recognition) and output (speech synthesis) cover different sets of languages:

  • Input (understood): ๐Ÿ‡บ๐Ÿ‡ธ English, ๐Ÿ‡ซ๐Ÿ‡ท French, ๐Ÿ‡ฉ๐Ÿ‡ช German, ๐Ÿ‡ฎ๐Ÿ‡น Italian, ๐Ÿ‡ต๐Ÿ‡น Portuguese, and ๐Ÿ‡ช๐Ÿ‡ธ Spanish.
  • Output (spoken): those six, plus ๐Ÿ‡ฎ๐Ÿ‡ณ Hindi, ๐Ÿ‡ฏ๐Ÿ‡ต Japanese, ๐Ÿ‡ฐ๐Ÿ‡ท Korean, ๐Ÿ‡จ๐Ÿ‡ณ Mandarin, and ๐Ÿ‡ท๐Ÿ‡บ Russian.

The agent can speak a language it canโ€™t transcribe from user audio. This is useful for translation-style flows where the user speaks one of the recognized languages and the agent replies in another.

Choose a voice by language

Every voice supports every output language. The difference between the two tables is the voiceโ€™s primary accent:

  • For an English accent (American or British) carried into other languages, pick from Voices.
  • For a native accent in a specific non-English language, pick the matching language-specific voice.

Voices

These voices have an American or British English accent. They speak ๐Ÿ‡บ๐Ÿ‡ธ English, ๐Ÿ‡ซ๐Ÿ‡ท French, ๐Ÿ‡ฉ๐Ÿ‡ช German, ๐Ÿ‡ฎ๐Ÿ‡น Italian, ๐Ÿ‡ต๐Ÿ‡น Portuguese, ๐Ÿ‡ช๐Ÿ‡ธ Spanish, ๐Ÿ‡ฎ๐Ÿ‡ณ Hindi, ๐Ÿ‡จ๐Ÿ‡ณ Mandarin, ๐Ÿ‡ท๐Ÿ‡บ Russian, ๐Ÿ‡ฐ๐Ÿ‡ท Korean, and ๐Ÿ‡ฏ๐Ÿ‡ต Japanese. Their English accent carries over into the other languages.

VoiceAccentDescriptionSample
ivy๐Ÿ‡บ๐Ÿ‡ธProfessional, deliberate, smooth
james๐Ÿ‡บ๐Ÿ‡ธConversational, professional, male
tyler๐Ÿ‡บ๐Ÿ‡ธTheatrical, energetic, chatty, jagged
winter๐Ÿ‡บ๐Ÿ‡ธEmpathetic, aesthetic, conversational
sam๐Ÿ‡บ๐Ÿ‡ธSoft, conversational, young
mia๐Ÿ‡บ๐Ÿ‡ธSmooth, conversational, young
bella๐Ÿ‡บ๐Ÿ‡ธHigh-pitched, chatty
david๐Ÿ‡บ๐Ÿ‡ธDeep, calming, conversational
jack๐Ÿ‡บ๐Ÿ‡ธSmooth, direct, clear, fast-paced
kyle๐Ÿ‡บ๐Ÿ‡ธChatty, nasal, expressive
helen๐Ÿ‡บ๐Ÿ‡ธSoft, older, calming
martha๐Ÿ‡บ๐Ÿ‡ธSouthern, older, warm
river๐Ÿ‡บ๐Ÿ‡ธSlow, calming, ASMR
emma๐Ÿ‡บ๐Ÿ‡ธLively, young, conversational
victor๐Ÿ‡บ๐Ÿ‡ธDeep, older
eleanor๐Ÿ‡บ๐Ÿ‡ธDeeper, older, calming
sophie๐Ÿ‡ฌ๐Ÿ‡งClear, smooth, instructive, simple
oliver๐Ÿ‡ฌ๐Ÿ‡งNarrative, conversational

Language-specific voices

These voices have a native accent in a specific non-English language. They also speak ๐Ÿ‡บ๐Ÿ‡ธ English, ๐Ÿ‡ซ๐Ÿ‡ท French, ๐Ÿ‡ฉ๐Ÿ‡ช German, ๐Ÿ‡ฎ๐Ÿ‡น Italian, ๐Ÿ‡ต๐Ÿ‡น Portuguese, ๐Ÿ‡ช๐Ÿ‡ธ Spanish, ๐Ÿ‡ฎ๐Ÿ‡ณ Hindi, ๐Ÿ‡จ๐Ÿ‡ณ Mandarin, ๐Ÿ‡ท๐Ÿ‡บ Russian, ๐Ÿ‡ฐ๐Ÿ‡ท Korean, and ๐Ÿ‡ฏ๐Ÿ‡ต Japanese, and they code-switch naturally between their primary language and English.

VoiceNative accentDescriptionSample
arjun๐Ÿ‡ฎ๐Ÿ‡ณ Hindi/HinglishConversational
ethan๐Ÿ‡จ๐Ÿ‡ณ MandarinConversational
dmitri๐Ÿ‡ท๐Ÿ‡บ RussianConversational
lukas๐Ÿ‡ฉ๐Ÿ‡ช GermanBritish English accent, conversational, smooth
lena๐Ÿ‡ฉ๐Ÿ‡ช GermanConversational, soft
pierre๐Ÿ‡ซ๐Ÿ‡ท FrenchConversational
mina๐Ÿ‡ฐ๐Ÿ‡ท Korean
ren๐Ÿ‡ฏ๐Ÿ‡ต Japanese
mei๐Ÿ‡จ๐Ÿ‡ณ Mandarin
joon๐Ÿ‡ฐ๐Ÿ‡ท Korean
giulia๐Ÿ‡ฎ๐Ÿ‡น Italian
luca๐Ÿ‡ฎ๐Ÿ‡น Italian
lucia๐Ÿ‡ช๐Ÿ‡ธ Spanish
hana๐Ÿ‡ฏ๐Ÿ‡ต Japanese
mateo๐Ÿ‡ช๐Ÿ‡ธ Spanish
diego๐Ÿ‡จ๐Ÿ‡ด Spanish (Latin American)Colombian