Tutorials

Get started using Claude 3.5 Sonnet with audio data

Learn how to use the Claude 3 models with audio and video data in Python.

Get started using Claude 3.5 Sonnet with audio data

Claude 3.5 Sonnet, recently announced by Anthropic, sets new industry benchmarks for many LLM tasks. It excels in tasks ranging from complex coding to nuanced literary analysis, showcasing exceptional context awareness and creativity.

In this tutorial, you'll learn how to use Claude 3.5 Sonnet, Claude 3 Opus, and Claude 3 Haiku with audio or video files in Python.

Pipeline for applying Claude 3 models to audio data

Here are a few example use cases you can use this pipeline for:

  • Creating summaries of long podcasts or YouTube videos
  • Asking any questions about the audio content
  • Generating action items from meetings

How does it work?

Since language models only work with text data, you first have to transcribe the audio data. Multimodal models can overcome this, but they are still in the early stages of development.

To achieve this, we use LeMUR, AssemblyAI's framework for applying LLMs to speech data. With LeMUR, you don't need to combine several different services, and can easily combine industry-leading Speech AI models and LLMs in just a few lines of code.

This is made possible through a collaboration between AssemblyAI and Anthropic. You can access all Claude 3 models through the AssemblyAI platform at no additional cost.

Set up the SDK

To get started, install the AssemblyAI Python SDK, which includes all LeMUR functionality.

pip install assemblyai

Then, import the package and set your API key. You can get one for free here.

import assemblyai as aai
aai.settings.api_key = "YOUR_API_KEY"
💡
Want to try out the code immediately? Use this Google Colab.

Transcribe an audio or video file

Next, transcribe an audio or video file by setting up a Transcriber and calling the transcribe() function. You can pass in any local file or publicly accessible URL.

Here we use a podcast episode of Lenny's podcast featuring Dalton Caldwell from Y Combinator.

audio_url = "https://storage.googleapis.com/aai-web-samples/lennyspodcast-daltoncaldwell-ycstartups.m4a"

transcriber = aai.Transcriber()
transcript = transcriber.transcribe(audio_url)

print(transcript.text)
Seeing everything people apply to YC, with people all kind of have the same idea...

Use Claude 3.5 Sonnet with audio data

Claude 3.5 Sonnet is Anthropic's most intelligent model to date, outperforming Claude 3 Opus on a wide range of evaluations while remaining cheaper.

To use Sonnet 3.5, call transcript.lemur.task(), a flexible endpoint that allows you to specify any prompt. It automatically adds the transcript as additional context for the model.

To use 3.5 Sonnet, specify aai.LemurModel.claude3_5_sonnet for the model when calling the LLM. Here's an example of a simple summarization prompt:

prompt = "Provide a brief summary of the transcript."

result = transcript.lemur.task(
    prompt, final_model=aai.LemurModel.claude3_5_sonnet
)

print(result.response)
Here's a brief summary of the transcript:

The transcript covers two main topics:

1. Advice for startup founders:
Dalton and Lenny discuss the importance of giving simple, pragmatic advice to founders. They talk about perseverance, knowing when to pivot or give up, and avoiding "tar pit ideas" that seem appealing but are consistently unsuccessful.

2. Dalton's early experiences in Silicon Valley:
Dalton shares his experiences from the early 2000s, including interactions with notable figures like Reid Hoffman, Sam Altman, and Sean Parker before they became famous. He discusses his own startup journey, including selling a company to MySpace and starting Pick Please, which competed with Instagram in the photo-sharing space.

The conversation provides insights into the startup world, both from an advisory perspective and through personal anecdotes from Silicon Valley's early days.

Use Claude 3 Opus with audio data

Claude 3 Opus is good at handling complex analysis, longer tasks with many steps, and higher-order math and coding tasks.

To use Opus, specify aai.LemurModel.claude3_opus for the model when calling the LLM. Here's an example of a prompt to extract certain information from the transcript:

prompt = "Extract all advice Dalton gives in this podcast episode. Use bullet points."

result = transcript.lemur.task(
    prompt, final_model=aai.LemurModel.claude3_opus
)

print(result.response)
Based on the transcript summaries, here are the main pieces of advice Dalton gives:

- Give founders simple, pragmatic advice like "sell shit, make money" and "don't die." Even elite athletes need reminders of fundamentals from their coaches.
- When advising struggling founders, consider if it is still fun for them and if the problem is fixable. Persevering founders often truly love their customers and products. 
- Discourage founders from continuing solely to avoid failure. 
- When discussing pivots, emphasize moving closer to the founder's personal expertise and experience.
- Be aware of "tar pit ideas" that attract many founders but are consistently unsuccessful, like social coordination apps. 
- Signs it may be time for founders to consider giving up include being out of growth ideas or disliking the work. However, most founders feel hopeless at some point but continue through willpower alone. Numerous success stories nearly failed but the founders refused to accept it.

Use Claude 3 Haiku with audio data

Claude 3 Haiku is the fastest and cheapest model, great for executing lightweight actions.

To use Haiku, specify aai.LemurModel.claude3_haiku for the model when calling the LLM. Here's an example of a simple prompt to ask your questions:

prompt = "What are tar pit ideas?"

result = transcript.lemur.task(
    prompt, final_model=aai.LemurModel.claude3_haiku
)

print(result.response)
Based on the transcript summary, "tar pit ideas" refer to ideas that attract many founders but are consistently unsuccessful. The example given is social coordination apps, which the summary states "seem appealing but people have worked on them for decades without success."

The key points about "tar pit ideas" from the summary are:

1. They are ideas that seem appealing to many founders, but are consistently unsuccessful over time.
2. The example provided is social coordination apps, which have been worked on for decades without success.
3. The implication is that these types of ideas, despite their initial appeal, end up being traps or "tar pits" that founders get stuck in without achieving success.

So in essence, "tar pit ideas" are concepts or business ideas that appear promising on the surface but have proven to be very difficult to execute successfully over the long term, trapping founders who pursue them.

Learn more about prompt engineering

And that's how easily you can apply Claude 3 models to audio data with AssemblyAI and the LeMUR framework! I hope you enjoyed the quick guide! To get the most out of LeMUR and the Claude 3 models, see the following resources: