Speaker Identification
Supported languages
enen_auen_uken_usesfrdeitptnlhijazhfikoplrutrukviafsqamarhyasazbaeubebnbsbrbgcahrcsdaetfoglkaelguhthahawhehuisidjwknkklolalvlnltlbmkmgmsmlmtmimrmnnenonnocpapsfarosasrsnsdsiskslsosuswsvtltgtatttetkuruzcyyiyoSupported models
universal-3-prouniversal-2Supported regions
US & EU
Overview
Replace generic “Speaker A” and “Speaker B” labels with real names or roles, no voice enrollment needed. Speaker Identification uses conversation content to infer who’s speaking and applies the identifiers you provide.
Example transformation:
Before:
After (by name):
After (by role):
Speaker Identification requires Speaker Diarization. You must set speaker_labels: true in your transcription request.
To reliably identify speakers, your audio should contain clear, distinguishable voices and sufficient spoken audio from each speaker. The accuracy of Speaker Diarization depends on the quality of the audio and the distinctiveness of each speaker’s voice, which will have a downstream effect on the quality of Speaker Identification.
Choosing how to identify speakers
You can identify speakers by name or by role:
- Know the speakers’ names? Use
speaker_type: "name"with the names inknown_valuesorspeakers. Click here to learn more. - Know their roles but not names? Use
speaker_type: "role"with roles like"Interviewer"or"Agent"inknown_valuesorspeakers. Click here to learn more. - Need better accuracy? Use
speakerswithdescriptionfields that provide context about what each speaker typically discusses. Click here to learn more.
How to use Speaker Identification
Include the speech_understanding parameter in your transcription request to identify speakers.
Already have a completed transcript? You can add Speaker Identification to an existing transcript in a separate request.
Identify by name
To identify speakers by name, use speaker_type: "name" with a list of speaker names in known_values. This is the most common approach when you know who is speaking in the audio.
Python
JavaScript
Python SDK
JavaScript SDK
Identify by role
To identify speakers by role instead of name, use speaker_type: "role" with role labels in known_values. This is useful for customer service calls, interviews, or any scenario where you know the roles but not the names.
Python
JavaScript
Python SDK
JavaScript SDK
Common role combinations
["Agent", "Customer"]- Customer service calls["AI Assistant", "User"]- AI chatbot interactions["Support", "Customer"]- Technical support calls["Interviewer", "Interviewee"]- Interview recordings["Host", "Guest"]- Podcast or show recordings["Moderator", "Panelist"]- Panel discussions
Adding speaker metadata
For more accurate speaker identification, you can use the speakers parameter instead of known_values. The speakers parameter lets you provide additional metadata about each speaker to help the model identify speakers based on conversational context.
This is particularly useful when:
- Speakers have similar voices but distinct roles or topics
- You want to provide contextual clues about what each speaker typically discusses
- You need more precise identification in complex multi-speaker scenarios
Each speaker object must include either a name or role (depending on speaker_type). Beyond that, you can add any additional properties you want. The name and role fields are reserved as strings, but all other properties are flexible and can be any structure.
Examples in this section are shown in Python for brevity. The same speaker_identification configuration works in any language.
At its simplest, you can provide a description alongside each speaker’s name or role:
For even more fine-tuned identification, you can include any additional custom properties on each speaker object, such as company, title, department, or any other fields that help describe the speaker:
You can use the same custom properties with role-based identification by replacing name with role in each speaker object.
API reference
Request
Include the speech_understanding parameter directly in your transcription request (shown here with name-based identification):
Request parameters
The following parameters are nested under speech_understanding.request.speaker_identification:
Response
The Speaker Identification API returns a modified version of your transcript with updated speaker labels in the utterances key.
Response fields
With Speaker Identification, the speaker field in utterances and words contains the identified name or role (e.g., "Michel Martin" or "Agent") instead of generic labels like "A", "B", "C". All other fields (text, start, end, confidence, words) remain unchanged from the standard transcription response.