Using Speaker Identification on an existing transcript | AssemblyAI

Overview

If you already have a completed transcript, you can add Speaker Identification in a separate request to the Speech Understanding API. This is especially useful when you want to re-identify speakers with different parameters, or when your workflow separates transcription from post-processing.

Speaker Identification requires Speaker Diarization. Your original transcription request must have set speaker_labels: true.

To transcribe and identify speakers in a single request, see the main Speaker Identification page.

Choosing how to identify speakers

You can identify speakers by name or by role:

Know the speakers’ names? Use speaker_type: "name" with the names in known_values or speakers. Click here to learn more.
Know their roles but not names? Use speaker_type: "role" with roles like "Interviewer" or "Agent" in known_values or speakers. Click here to learn more.
Need better accuracy? Use speakers with description fields that provide context about what each speaker typically discusses. Click here to learn more.

How to use Speaker Identification on an existing transcript

First, transcribe your audio with speaker_labels: true. Once the transcription is complete, send the transcript_id along with your speaker identification configuration to the Speech Understanding API.

Identify by name

To identify speakers by name, use speaker_type: "name" with a list of speaker names in known_values. This is the most common approach when you know who is speaking in the audio.

Python

JavaScript

1 import requests
2 import time
3 
4 base_url = "https://api.assemblyai.com"
5 
6 headers = {
7     "authorization": "<YOUR_API_KEY>"
8 }
9 
10 # Need to transcribe a local file? Learn more here: https://www.assemblyai.com/docs/getting-started/transcribe-an-audio-file
11 
12 upload_url = "https://assembly.ai/wildfires.mp3"
13 
14 data = {
15     "audio_url": upload_url,
16     "speech_models": ["universal-3-pro", "universal-2"],
17     "language_detection": True,
18     "speaker_labels": True
19 }
20 
21 # Transcribe file
22 
23 response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)
24 
25 transcript_id = response.json()["id"]
26 polling_endpoint = base_url + f"/v2/transcript/{transcript_id}"
27 
28 # Poll for transcription results
29 
30 while True:
31     transcript = requests.get(polling_endpoint, headers=headers).json()
32 
33     if transcript["status"] == "completed":
34         break
35 
36     elif transcript["status"] == "error":
37         raise RuntimeError(f"Transcription failed: {transcript['error']}")
38 
39     else:
40         time.sleep(3)
41 
42 # Enable speaker identification
43 
44 understanding_body = {
45     "transcript_id": transcript_id,
46     "speech_understanding": {
47         "request": {
48             "speaker_identification": {
49                 "speaker_type": "name",
50                 "known_values": ["Michel Martin", "Peter DeCarlo"]  # Change these values to match the names of the speakers in your file
51             }
52         }
53     }
54 }
55 
56 # Send the modified transcript to the Speech Understanding API
57 
58 result = requests.post(
59     "https://llm-gateway.assemblyai.com/v1/understanding",
60     headers=headers,
61     json=understanding_body
62 ).json()
63 
64 # Access the results and print utterances to the terminal
65 
66 for utterance in result["utterances"]:
67     print(f"{utterance['speaker']}: {utterance['text']}")

Identify by role

To identify speakers by role instead of name, use speaker_type: "role" with role labels in known_values. This is useful for customer service calls, interviews, or any scenario where you know the roles but not the names.

Python

JavaScript

1 import requests
2 import time
3 
4 base_url = "https://api.assemblyai.com"
5 
6 headers = {
7     "authorization": "<YOUR_API_KEY>"
8 }
9 
10 # Need to transcribe a local file? Learn more here: https://www.assemblyai.com/docs/getting-started/transcribe-an-audio-file
11 
12 upload_url = "https://assembly.ai/wildfires.mp3"
13 
14 data = {
15     "audio_url": upload_url,
16     "speech_models": ["universal-3-pro", "universal-2"],
17     "language_detection": True,
18     "speaker_labels": True
19 }
20 
21 # Transcribe file
22 
23 response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)
24 
25 transcript_id = response.json()["id"]
26 polling_endpoint = base_url + f"/v2/transcript/{transcript_id}"
27 
28 # Poll for transcription results
29 
30 while True:
31     transcript = requests.get(polling_endpoint, headers=headers).json()
32 
33     if transcript["status"] == "completed":
34         break
35 
36     elif transcript["status"] == "error":
37         raise RuntimeError(f"Transcription failed: {transcript['error']}")
38 
39     else:
40         time.sleep(3)
41 
42 # Enable role-based speaker identification
43 
44 understanding_body = {
45     "transcript_id": transcript_id,
46     "speech_understanding": {
47         "request": {
48             "speaker_identification": {
49                 "speaker_type": "role",
50                 "known_values": ["Interviewer", "Interviewee"]  # Change these values to match the roles of the speakers in your file
51             }
52         }
53     }
54 }
55 
56 # Send the modified transcript to the Speech Understanding API
57 
58 result = requests.post(
59     "https://llm-gateway.assemblyai.com/v1/understanding",
60     headers=headers,
61     json=understanding_body
62 ).json()
63 
64 # Access the results and print utterances to the terminal
65 
66 for utterance in result["utterances"]:
67     print(f"{utterance['speaker']}: {utterance['text']}")

Common role combinations

["Agent", "Customer"] - Customer service calls
["AI Assistant", "User"] - AI chatbot interactions
["Support", "Customer"] - Technical support calls
["Interviewer", "Interviewee"] - Interview recordings
["Host", "Guest"] - Podcast or show recordings
["Moderator", "Panelist"] - Panel discussions

Adding speaker metadata

For more accurate identification, use the speakers parameter instead of known_values to provide descriptions and metadata. The examples below show the understanding_body payload sent to the Speech Understanding API. For setup, transcription, and polling code, see the full examples above.

Examples in this section are shown in Python for brevity. The same speaker_identification configuration works in any language.

At its simplest, you can provide a description alongside each speaker’s name or role:

1 understanding_body = {
2   "transcript_id": transcript_id,
3   "speech_understanding": {
4     "request": {
5       "speaker_identification": {
6         "speaker_type": "role",
7         "speakers": [
8           {
9             "role": "interviewer",
10             "description": "Hosts the program and interviews the guests"
11           },
12           {
13             "role": "guest",
14             "description": "Answers questions from the interview"
15           }
16         ]
17       }
18     }
19   }
20 }
21 
22 # Send the modified transcript to the Speech Understanding API
23 result = requests.post(
24   "https://llm-gateway.assemblyai.com/v1/understanding",
25   headers = headers,
26   json = understanding_body
27 ).json()

For even more fine-tuned identification, you can include any additional custom properties on each speaker object, such as company, title, department, or any other fields that help describe the speaker:

1 understanding_body = {
2   "transcript_id": transcript_id,
3   "speech_understanding": {
4     "request": {
5       "speaker_identification": {
6         "speaker_type": "name",
7         "speakers": [
8           {
9             "name": "Michel Martin",
10             "description": "Hosts the program and interviews the guests",
11             "company": "NPR",
12             "title": "Host Morning Edition"
13           },
14           {
15             "name": "Peter DeCarlo",
16             "description": "Answers questions from the interview",
17             "company": "Johns Hopkins University",
18             "title": "Professor and Vice Chair of Environmental Health and Engineering"
19           }
20         ]
21       }
22     }
23   }
24 }

You can use the same custom properties with role-based identification by replacing name with role in each speaker object.

API reference

Request

Retrieve the completed transcript and send it to the Speech Understanding API:

$ # Step 1: Submit transcription job
$ curl -X POST "https://api.assemblyai.com/v2/transcript" \
>   -H "authorization: <YOUR_API_KEY>" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "audio_url": "https://assembly.ai/wildfires.mp3",
>     "speaker_labels": true
>   }'
$ 
$ # Save the transcript_id from the response above, then use it in the following commands
$ 
$ # Step 2: Poll for transcription status (repeat until status is "completed")
$ curl -X GET "https://api.assemblyai.com/v2/transcript/{transcript_id}" \
>   -H "authorization: <YOUR_API_KEY>"
$ 
$ # Step 3: Once transcription is completed, enable speaker identification
$ curl -X POST "https://llm-gateway.assemblyai.com/v1/understanding" \
>   -H "authorization: <YOUR_API_KEY>" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "transcript_id": "{transcript_id}",
>     "speech_understanding": {
>       "request": {
>         "speaker_identification": {
>           "speaker_type": "name",
>           "known_values": ["Michel Martin", "Peter DeCarlo"]
>         }
>       }
>     }
>   }'

Request parameters

For the full list of request parameters, see the Speaker Identification API reference.

Response

For the response format and fields, see the Speaker Identification response reference.