Extract Dialogue Data with LeMUR and JSON | AssemblyAI

In this guide, we’ll show you how to use AssemblyAI’s LeMUR (Leveraging Large Language Models to Understand Recognized Speech) framework to process several audio files, and then format your results in JSON (JavaScript Object Notation) format.

JSON allows you to programmatically format, parse, and transfer resopnses from LeMUR, which is useful for implementing LeMUR with a wide range of other applications.

In this example, we will leverage the JSON formatting to create a .csv file from a directory of files that must be transcribed and submitted to LeMUR. However, you can use the same concepts in this guide to generate a JSON-formatted response, which you can then use to update a database table or interact with other APIs.

Quickstart

1 import assemblyai as aai
2 import json
3 import os
4 import csv
5 
6 aai.settings.api_key = "your_api_key"
7 # configure the name of your output .csv file
8 output_filename = "profiles.csv"
9 
10 transcriber = aai.Transcriber()
11 # Create a new Transcript Group with every file in the directory "interviews"
12 transcript_group = transcriber.transcribe_group([os.path.join("interviews", file) for file in os.listdir("interviews")])
13 
14 prompt = """
15          You are an HR executive scanning through an interview transcript to extract information about a candidate.
16          You are required to create a JSON response with key information about the candidate.
17          You will use this template for your answer:
18          {
19             "Name": "<candidate-name>",
20             "Position": "<job position that candidate is applying for>",
21             "Past experience": "<A short phrase describing the candidate's relevant past experience for the role>",
22          }
23 
24          Do not include any other text in your response. Only respond in JSON format, as your response will be parsed programmatically as JSON.
25          """
26 
27 with open(output_filename, "w", newline="") as file:
28     writer = csv.writer(file)
29     # define the header of your .csv file
30     header = ["Name", "Position", "Past Experience"]
31     writer.writerow(header)
32 
33     print("Prompting LeMUR")
34     for transcript in transcript_group:
35         result = transcript.lemur.task(prompt=prompt, final_model=aai.LemurModel.claude3_5_sonnet)
36         # json.loads() method can be used to parse a valid JSON string and convert it into a Python dictionary.
37         interviewee_data = json.loads(result.response)
38         writer.writerow(interviewee_data.values())
39 
40 print(f"Created .csv file {output_filename}")

Get Started

Before we begin, make sure you have an AssemblyAI account and an API key. You can sign up for an AssemblyAI account and get your API key from your dashboard.

LeMUR features are currently only available to paid users to two pricing tiers: LeMUR and LeMUR Basic. See pricing for more detail.

Step-by-Step Instructions

In this guide, we will ask the same questions to LeMUR about multiple files. Then, we will collate the answers in a .csv file.

Import the necessary libraries for making an HTTP request and set your API key.

1 import assemblyai as aai
2 import json
3 import os
4 import csv
5 
6 aai.settings.api_key = "your_api_key"
7 # configure the name of your output .csv file
8 output_filename = "profiles.csv"

Transcribe your audio files.

1 transcriber = aai.Transcriber()
2 # Create a new Transcript Group with every file in the directory "interviews"
3 transcript_group = transcriber.transcribe_group([os.path.join("interviews", file) for file in os.listdir("interviews")])

Define your LeMUR request prompt for the Task feature.

1 prompt = """
2          You are an HR executive scanning through an interview transcript to extract information about a candidate.
3          You are required to create a JSON response with key information about the candidate.
4          You will use this template for your answer:
5          {
6             "Name": "<candidate-name>",
7             "Position": "<job position that candidate is applying for>",
8             "Past experience": "<A short phrase describing the candidate's relevant past experience for the role>",
9          }
10 
11          Do not include any other text in your response. Only respond in JSON format, as your response will be parsed programmatically as JSON.
12          """

Construct your .csv file and parse the JSON data.

1 with open(output_filename, "w", newline="") as file:
2     writer = csv.writer(file)
3     # define the header of your .csv file
4     header = ["Name", "Position", "Past Experience"]
5     writer.writerow(header)
6 
7     print("Prompting LeMUR")
8     for transcript in transcript_group:
9         result = transcript.lemur.task(prompt=prompt, final_model=aai.LemurModel.claude3_5_sonnet)
10         # json.loads() method can be used to parse a valid JSON string and convert it into a Python dictionary.
11         interviewee_data = json.loads(result.response)
12         writer.writerow(interviewee_data.values())
13 
14 print(f"Created .csv file {output_filename}")

For context, this is the response from LeMUR with our prompt.

1 {
2   "Name": "John Smith",
3   "Position": "software engineer",
4   "Past experience": "three years of experience at Google"
5 }

You can now run your Python script and you should see that a profiles.csv file is generated. Your result will look similar to the example below.