Process speaker labels with LeMUR — AssemblyAI

In this guide, you’ll learn how to use AssemblyAI’s API to transcribe audio, identify speakers, and infer their names using LeMUR. We’ll walk through the process of configuring the transcriber, submitting a transcript to LeMUR with speaker labels, and generating a mapping of speaker names from the transcript.

This workflow will enable you to have speaker labels with the speaker’s name in your transcripts:

1 Before:
2 Speaker A: G'day, bud.
3 Speaker B: How are you? Very good.
4 
5 After:
6 Ben: G'day, bud.
7 Bryce: How are you? Very good.

Before you begin

To complete this tutorial, you need:

Python installed.
An upgraded AssemblyAI account.

For the entire source code of this guide, see Speaker Identification.

Step-by-step instructions

Install the Python SDK:

$ pip install assemblyai

1 import assemblyai as aai
2 
3 aai.settings.api_key = "<YOUR_API_KEY>"

Define a Transcriber, a TranscriptionConfig with speaker_labels set to True. Then, create a transcript.

1 transcriber = aai.Transcriber()
2 config = aai.TranscriptionConfig(speaker_labels=True)
3 audio_url = "https://www.listennotes.com/e/p/accd617c94a24787b2e0800f264b7a5e/"
4 transcript = transcriber.transcribe(audio_url, config)

Process the transcript with speaker labels:

1 text_with_speaker_labels = ""
2 for utt in transcript.utterances:
3     text_with_speaker_labels += f"Speaker {utt.speaker}:\n{utt.text}\n"

Count the unique speakers, then create a LemurQuestion for each speaker. Lastly, ask LeMUR the questions, specifying text_with_speaker_labels as the input_text.

1 unique_speakers = set(utterance.speaker for utterance in transcript.utterances)
2 
3 questions = []
4 for speaker in unique_speakers:
5     questions.append(
6         aai.LemurQuestion(
7             question=f"Who is speaker {speaker}?",
8             answer_format="<First Name> <Last Name (if applicable)>"
9         )
10     )
11 
12 result = aai.Lemur().question(
13     questions,
14     input_text=text_with_speaker_labels,
15     context="Your task is to infer the speaker's name from the speaker-labelled transcript"
16 )

Map the speaker alphabets to their names from LeMUR:

1 speaker_mapping = {}
2 for qa_response in result.response:
3     pattern = r"Who is speaker (\w)\?"
4     match = re.search(pattern, qa_response.question)
5     if match and match.group(1) not in speaker_mapping.keys():
6         speaker_mapping.update({match.group(1): qa_response.answer})

Print the transcript with Speaker names:

1 for utterance in transcript.utterances:
2     speaker_name = speaker_mapping[utterance.speaker]
3     print(f"{speaker_name}: {utterance.text}")

Output:

1 Ben Kingsley: G'day, folks. Ben Kingsley here in this throwback Tuesday bonus episode, ...
2 Bryce: All right, folks, you're on the property couch, where each week, Ben and I give you the insider's guide to property investing. Hi, mate.
3 Ben Kingsley: G'day, bud.
4 Bryce: How are you? Very good. Hey, we should do a little sound check here, Ben...

1	Before:
2	Speaker A: G'day, bud.
3	Speaker B: How are you? Very good.
4
5	After:
6	Ben: G'day, bud.
7	Bryce: How are you? Very good.

1	import assemblyai as aai
2
3	aai.settings.api_key = "<YOUR_API_KEY>"

1	transcriber = aai.Transcriber()
2	config = aai.TranscriptionConfig(speaker_labels=True)
3	audio_url = "https://www.listennotes.com/e/p/accd617c94a24787b2e0800f264b7a5e/"
4	transcript = transcriber.transcribe(audio_url, config)

1	text_with_speaker_labels = ""
2	for utt in transcript.utterances:
3	text_with_speaker_labels += f"Speaker {utt.speaker}:\n{utt.text}\n"

1	unique_speakers = set(utterance.speaker for utterance in transcript.utterances)
2
3	questions = []
4	for speaker in unique_speakers:
5	questions.append(
6	aai.LemurQuestion(
7	question=f"Who is speaker {speaker}?",
8	answer_format="<First Name> <Last Name (if applicable)>"
9	)
10	)
11
12	result = aai.Lemur().question(
13	questions,
14	input_text=text_with_speaker_labels,
15	context="Your task is to infer the speaker's name from the speaker-labelled transcript"
16	)

1	speaker_mapping = {}
2	for qa_response in result.response:
3	pattern = r"Who is speaker (\w)\?"
4	match = re.search(pattern, qa_response.question)
5	if match and match.group(1) not in speaker_mapping.keys():
6	speaker_mapping.update({match.group(1): qa_response.answer})

1	for utterance in transcript.utterances:
2	speaker_name = speaker_mapping[utterance.speaker]
3	print(f"{speaker_name}: {utterance.text}")

1	Ben Kingsley: G'day, folks. Ben Kingsley here in this throwback Tuesday bonus episode, ...
2	Bryce: All right, folks, you're on the property couch, where each week, Ben and I give you the insider's guide to property investing. Hi, mate.
3	Ben Kingsley: G'day, bud.
4	Bryce: How are you? Very good. Hey, we should do a little sound check here, Ben...