LlamaIndex is a flexible data framework for connecting custom data sources to Large Language Models (LLMs). With LlamaIndex, you can easily store and index your data and then apply LLMs.
LLMs only work with textual data, so to process audio files with LLMs we first need to transcribe them into text.
Luckily, LlamaIndex provides an AssemblyAI integration through Llama Hub that lets you load audio data with just a few lines of code:
from llama_hub.assemblyai.base import AssemblyAIAudioTranscriptReader
AssemblyAIAudioTranscriptReader(file_path="./my_file.mp3")
docs = reader.load_data()
Let's learn how to use this data reader step-by-step. For this, we create a small demo application with an LLM-powered query engine that lets you load audio data and ask questions about your data.
Getting Started
Create a new virtual environment:
# Mac/Linux:
python3 -m venv venv
. venv/bin/activate
# Windows:
python -m venv venv
.\venv\Scripts\activate.bat
Install LlamaIndex, Llama Hub, and the AssemblyAI Python package:
pip install llama-index llama-hub assemblyai
Set your AssemblyAI API key as an environment variable named ASSEMBLYAI_API_KEY. You can get a free API key here.
# Mac/Linux:
export ASSEMBLYAI_API_KEY=<YOUR_KEY>
# Windows:
set ASSEMBLYAI_API_KEY=<YOUR_KEY>
Use the AssemblyAIAudioTranscriptReader
To load and transcribe audio data into documents, import the AssemblyAIAudioTranscriptReader
. It needs at least the file_path
argument with an audio file specified as an URL or a local file path. You can read more about the integration in the official Llama Hub docs.
from llama_hub.assemblyai.base import AssemblyAIAudioTranscriptReader
audio_file = "https://storage.googleapis.com/aai-docs-samples/sports_injuries.mp3"
# or a local file path: audio_file = "./sports_injuries.mp3"
reader = AssemblyAIAudioTranscriptReader(file_path=audio_file)
docs = reader.load_data()
After loading the data, the transcribed text is stored in the text
attribute.
print(docs[0].text)
# Runner's knee. Runner's knee is a condition ...
The metadata
contains the full JSON response of our API with more meta information:
print(docs[0].metadata)
# {'language_code': <LanguageCode.en_us: 'en_us'>,
# 'punctuate': True,
# 'format_text': True,
# …
# }
Tip: The default configuration of the document loader returns a list with only one document, that's why here we access the first document in the list with docs[0]. But you can use a different TranscriptFormat that splits text for example by sentences or paragraphs, and returns multiple documents. You can read more about the TranscriptFormat options here.
from llama_hub.assemblyai.base import TranscriptFormat
reader = AssemblyAIAudioTranscripReader(
file_path="./your_file.mp3",
transcript_format=TranscriptFormat.SENTENCES,
)
docs = reader.load_data()
# Now it returns a list with multiple documents
Apply a Vector Store Index and a Query Engine
Now that you have loaded the transcribed text into LlamaIndex documents, you can easily ask questions about the spoken data. For example, you can apply a model from OpenAI with a Query Engine.
For this, you also need to set your OpenAI API key as an environment variable:
# Mac/Linux:
export OPENAI_API_KEY=<YOUR_OPENAI_KEY>
# Windows:
set OPENAI_API_KEY=<YOUR_OPENAI_KEY>
Now, you can create a VectorStoreIndex
and a query engine from the retrieved documents from the first step.
The metadata needs to be smaller than the text chunk size, and since it contains the full JSON response with extra information, it is quite large. For simplicity, we just remove it here:
from llama_index import VectorStoreIndex
# Metadata needs to be smaller than chunk size
# For simplicity we just get rid of it
docs[0].metadata = {}
index = VectorStoreIndex.from_documents(docs)
query_engine = index.as_query_engine()
response = query_engine.query("What is a runner's knee?")
print(response)
# Runner's knee is a condition characterized by ...
Conclusion
This tutorial explained how to use the AssemblyAI data reader for LlamaIndex. You learned how to transcribe audio files and load the transcribed text into LlamaIndex documents, and how to create a Query Engine to ask questions about your spoken data.
Below is the complete code:
from llama_index import VectorStoreIndex
from llama_hub.assemblyai.base import AssemblyAIAudioTranscriptReader
audio_file = "https://storage.googleapis.com/aai-docs-samples/sports_injuries.mp3"
reader = AssemblyAIAudioTranscriptReader(file_path=audio_file)
docs = reader.load_data()
docs[0].metadata = {}
index = VectorStoreIndex.from_documents(docs)
query_engine = index.as_query_engine()
response = query_engine.query("What is a runner's knee?")
print(response)
If you enjoyed this article, feel free to check out some others on our blog, like
- How to use audio data in LangChain with Python
- Convert Speech to Text in Python in 5 Minutes
- How to get Zoom Transcripts with the Zoom API
Alternatively, check out our YouTube channel for learning resources on AI, like our Machine Learning from Scratch series.