Generate Custom Speaker Labels with Pyannote
In this guide, we’ll show you how to generate Speaker Labels using Pyannote with an AssemblyAI transcript. This can be used to generate Speaker Labels for languages we currently do not support for speaker labelling.
Quickstart
Get Started
Before we begin, make sure you have an AssemblyAI account and an API key. You can sign up for a free account and get your API key from your dashboard.
You’ll also need a HuggingFace account and API key. You can sign up for a free account and get your API key here. Create a Read type API token to ensure the necessary permissions are enabled.
Browse to the speaker-diarization and segmentation model pages and accept the Gated Model Terms & Conditions by entering your Company/University, Website and Use Case details in order to gain access to the use of these models.
Step-by-Step Instructions
Install the necessary dependencies.
Import the necessary dependencies, assign your API keys and authenticate with AssemblyAI.
Create the transcribe_audio
function, this will handle the transcription process with AssemblyAI.
Create the get_speaker_labels
function, this will handle the speaker diarization model processing to generate the custom speaker labels for the transcript.
Firstly, it initializes and applies the pipeline to the audio file.
Secondly, it processes the diarization results and converts the speaker segments into a DataFrame so we can compare the results with the transcript.
Lastly, the speaker segments are compared and assigned to the words and sentences of the transcript to create the speaker labelled transcript.
How can I set the number of speakers?
If you know the number of speakers in advance, you can use the num_speakers
parameter to set the number of speakers:
You can also provide upper/lower bands on the number of speakers using the min_speakers
and max_speakers
parameters:
Create the format_timestamp
, this will handle the timestamps conversion to improve the readability of the final speaker labelled transcript.
Finally, select a local file and call the functions to generate and print your custom Speaker Labelled transcript.
Here’s an example speaker labelled output from a Croatian file: