Apply LLMs to audio files — AssemblyAI

Overview

A Large Language Model (LLM) is a machine learning model that uses natural language processing (NLP) to generate text. LeMUR is a framework that lets you apply LLMs to audio transcripts, for example to ask questions about a call, or to summarize a meeting.

By the end of this tutorial, you’ll be able to use LeMUR to summarize an audio file.

Here’s the full sample code for what you’ll build in this tutorial:

Python

TypeScript

Go

Java

C#

Ruby

1 import assemblyai as aai
2 
3 aai.settings.api_key = "<YOUR_API_KEY>"
4 
5 transcriber = aai.Transcriber()
6 
7 # You can use a local filepath:
8 # audio_file = "./example.mp3"
9 
10 # Or use a publicly-accessible URL:
11 audio_file = (
12     "https://assembly.ai/sports_injuries.mp3"
13 )
14 transcript = transcriber.transcribe(audio_file)
15 
16 prompt = "Provide a brief summary of the transcript."
17 
18 result = transcript.lemur.task(
19     prompt, final_model=aai.LemurModel.claude3_5_sonnet
20 )
21 
22 print(result.response)

If you run the code above, you’ll see the following output:

1 The transcript describes several common sports injuries - runner's knee,
2 sprained ankle, meniscus tear, rotator cuff tear, and ACL tear. It provides
3 definitions, causes, and symptoms for each injury. The transcript seems to be
4 narrating sports footage and describing injuries as they occur to the athletes.
5 Overall, it provides an overview of these common sports injuries that can result
6 from overuse or sudden trauma during athletic activities

Before you begin

To complete this tutorial, you need:

Python, TypeScript, Go, Java, .NET, or Ruby installed.
An AssemblyAI account with a credit card set up.
Basic understanding of how to Transcribe an audio file.

Step 1: Install the SDK

Python

TypeScript

Go

Java

C#

Ruby

Install the package via pip:

$ pip install assemblyai

Step 2: Transcribe an audio file

LeMUR uses one or more transcripts as input to generate text output. In this step, you’ll transcribe an audio file that you can later use to create a prompt for.

For more information about transcribing audio, see Transcribe an audio file.

Python

TypeScript

Go

Java

C#

Ruby

1 import assemblyai as aai
2 
3 aai.settings.api_key = "<YOUR_API_KEY>"
4 
5 transcriber = aai.Transcriber()
6 
7 audio_url = "https://assembly.ai/sports_injuries.mp3"
8 
9 transcript = transcriber.transcribe(audio_url)

Use existing transcript

If you’ve already transcribed an audio file you want to use, you can get an existing transcript using its ID. You can find the ID for previously transcribed audio files in the Processing queue.

Python

TypeScript

Go

Java

C#

Ruby

1 transcript = aai.Transcript.get_by_id("YOUR_TRANSCRIPT_ID")

Step 3: Prompt LeMUR to generate text output

In this step, you’ll create a Custom task with LeMUR and use the transcript you created in the previous step as input.

The input to a custom task is called a prompt. A prompt is a text string that provides LeMUR with instructions on how to generate the text output.

For more techniques on how to build prompts, see Improving your prompt.

Python

TypeScript

Go

Java

C#

Ruby

Write a prompt with instructions on how LeMUR should generate the text output.

1 prompt = "Provide a brief summary of the transcript."

Python

TypeScript

Go

Java

C#

Ruby

Create a custom task with LeMUR, using the transcript and prompt as input. The final model defines the LLM to use to process the task. For available models to choose from, see Change the model type.

1 result = transcript.lemur.task(
2     prompt, final_model=aai.LemurModel.claude3_5_sonnet
3 )

Print the result.

Python

TypeScript

Go

Java

C#

Ruby

1 print(result.response)

The output will look something like this:

 The transcript describes several common sports injuries - runner's knee,
 sprained ankle, meniscus tear, rotator cuff tear, and ACL tear. It provides
 definitions, causes, and symptoms for each injury. The transcript seems to be
 narrating sports footage and describing injuries as they occur to the athletes.
 Overall, it provides an overview of these common sports injuries that can
 result from overuse or sudden trauma during athletic activities

Next steps

In this tutorial, you’ve learned how to generate LLM output based on your audio transcripts. The type of output depends on your prompt, so try exploring different prompts to see how they affect the output. Here’s a few more prompts to try.

”Provide an analysis of the transcript and offer areas to improve with exact quotes."
"What’s the main take-away from the transcript?"
"Generate a set of action items from this transcript.”

To learn more about how to apply LLMs to your transcripts, see the following resources:

Need some help?

If you get stuck, or have any other questions, we’d love to help you out. Contact our support team at support@assemblyai.com or create a support ticket.

1	import assemblyai as aai
2
3	aai.settings.api_key = "<YOUR_API_KEY>"
4
5	transcriber = aai.Transcriber()
6
7	# You can use a local filepath:
8	# audio_file = "./example.mp3"
9
10	# Or use a publicly-accessible URL:
11	audio_file = (
12	"https://assembly.ai/sports_injuries.mp3"
13	)
14	transcript = transcriber.transcribe(audio_file)
15
16	prompt = "Provide a brief summary of the transcript."
17
18	result = transcript.lemur.task(
19	prompt, final_model=aai.LemurModel.claude3_5_sonnet
20	)
21
22	print(result.response)

1	The transcript describes several common sports injuries - runner's knee,
2	sprained ankle, meniscus tear, rotator cuff tear, and ACL tear. It provides
3	definitions, causes, and symptoms for each injury. The transcript seems to be
4	narrating sports footage and describing injuries as they occur to the athletes.
5	Overall, it provides an overview of these common sports injuries that can result
6	from overuse or sudden trauma during athletic activities