Identifying speakers in audio recordings
When applying the Speaker Diarization model, the transcription not only contains the text but also includes speaker labels, enhancing the overall structure and organization of the output.
In this step-by-step guide, you’ll learn how to apply the model. In short, you have to send the speaker_labels
parameter in your request, and then find the results inside a field called utterances
.
Get started
Before we begin, make sure you have an AssemblyAI account and an API key. You can sign up for a free account and get your API key from your dashboard.
The complete source code for this guide can be viewed here.
Here is an audio example for this guide:
Step-by-step instructions
Python SDK
Import the assemblyai
package and set the API key.
Python SDK
Python (requests)
TypeScript
PHP
Ruby
C#
Python SDK
Create a TranscriptionConfig
with speaker_labels
set to True
.
Python SDK
Python (requests)
TypeScript
PHP
Ruby
C#
Python SDK
Create a Transcriber
object and pass in the configuration.
Python SDK
Python (requests)
TypeScript
PHP
Ruby
C#
Understanding the response
The speaker label information is included in the utterances
key of the response. Each utterance object in the list includes a speaker
field, which contains a string identifier for the speaker (e.g., “A”, “B”, etc.). The utterances list also contains a text
field for each utterance containing the spoken text, and confidence
scores both for utterances and their individual words.
For more information, see the Speaker Diarization model documentation or see the API reference.
Specifying the number of speakers
You can provide the optional parameter speakers_expected
, that can be used to specify the expected number of speakers in an audio file.
Conclusion
Automatically identifying different speakers from an audio recording, also called speaker diarization, is a multi-step process. It can unlock additional value from many genres of recording, including conference call transcripts, broadcast media, podcasts, and more. You can learn more about use cases for speaker diarization and the underlying research from the AssemblyAI blog.