Skip to main content

LlamaIndex Python Integration with AssemblyAI

You can use the AssemblyAI Audio Transcript Loader from LlamaHub to transcribe audio files inside your LlamaIndex applications.

Looking for the JavaScript integration? Check out the LlamaIndex.TS integration.

Quickstart

First, install the assemblyai python package.

pip install assemblyai

Set your AssemblyAI API key as an environment variable named ASSEMBLYAI_API_KEY. You can get a free AssemblyAI API key from the AssemblyAI dashboard.

# Mac/Linux:
export ASSEMBLYAI_API_KEY=YOUR_API_KEY

# Windows:
set ASSEMBLYAI_API_KEY=YOUR_API_KEY
  1. To load and transcribe audio data into documents, import the AssemblyAIAudioTranscriptReader. The AssemblyAIAudioTranscriptReader needs at least the file_path argument.
  2. Configure the file_path argument with a URL or a local file path to an audio or video file.
from llama_hub.assemblyai import AssemblyAIAudioTranscriptReader

audio_file = "https://assembly.ai/nbc.mp3"
# or a local file path: audio_file = "./nbc.mp3"

reader = AssemblyAIAudioTranscriptReader(file_path=audio_file)

docs = reader.load_data()
note

reader.load_data() waits until the transcription is ready.

The reader.load_data() method returns an array of documents, but by default, there's only one document in the array with the full transcript. The transcribed text is available in the text attribute:

docs[0].text
# "Load time, a new president and new congressional makeup. Same old ..."

The metadata contains the full transcript object with more meta information:

docs[0].metadata
# {'language_code': <LanguageCode.en_us: 'en_us'>,
# 'audio_url': 'https://assembly.ai/nbc.mp3',
# 'punctuate': True,
# 'format_text': True,
# ...
# }

Transcript formats

You can specify the transcript_format argument to load the transcript in different formats.

Depending on the format, load_data() returns either one or more documents. These are the different TranscriptFormat options:

  • TEXT: One document with the transcription text
  • SENTENCES: Multiple documents, splits the transcription by each sentence
  • PARAGRAPHS: Multiple documents, splits the transcription by each paragraph
  • SUBTITLES_SRT: One document with the transcript exported in SRT subtitles format
  • SUBTITLES_VTT: One document with the transcript exported in VTT subtitles format
from llama_hub.assemblyai import TranscriptFormat

reader = AssemblyAIAudioTranscripReader(
file_path="./your_file.mp3",
transcript_format=TranscriptFormat.SENTENCES,
)

docs = reader.load_data()

Transcription config

You can also specify the config argument to use different audio intelligence models.

import assemblyai as aai

config = aai.TranscriptionConfig(speaker_labels=True,
auto_chapters=True,
entity_detection=True
)

reader = AssemblyAIAudioTranscriptReader(
file_path="./your_file.mp3",
config=config
)

Pass the API key as argument

You can also pass the AssemblyAI API key as an argument instead of an environment variable.

reader = AssemblyAIAudioTranscriptReader(
file_path="./your_file.mp3",
api_key="YOUR_API_KEY"
)