LangChain is a framework for developing applications powered by Large Language Models (LLMs). With LangChain, you can easily apply LLMs to your data and, for example, ask questions about the contents of your data. LLMs only work with textual data, so to process audio files with LLMs we first need to transcribe them into text.
Luckily, LangChain provides an AssemblyAI integration that lets you load audio data with just a few lines of code:
from langchain.document_loaders import AssemblyAIAudioTranscriptLoader
loader = AssemblyAIAudioTranscriptLoader("./my_file.mp3")
docs = loader.load()
Let's learn how to use this integration step-by-step. For this, we create a small demo application that lets you load audio data and apply an LLM that can answer questions about your spoken data.
Getting Started
Create a new virtual environment and activate it:
# Mac/Linux:
python3 -m venv venv
. venv/bin/activate
# Windows:
python -m venv venv
.\venv\Scripts\activate.bat
Install LangChain and the AssemblyAI Python package:
pip install langchain
pip install assemblyai
Set your AssemblyAI API key as an environment variable named ASSEMBLYAI_API_KEY
. You can get a free API key here.
# Mac/Linux:
export ASSEMBLYAI_API_KEY=<YOUR_KEY>
# Windows:
set ASSEMBLYAI_API_KEY=<YOUR_KEY>
Use the AssemblyAIAudioTranscriptLoader
To load and transcribe audio data into documents, import the AssemblyAIAudioTranscriptLoader
. It needs at least the file_path
argument with an audio file specified as an URL or a local file path. You can read more about the integration in the official LangChain docs.
from langchain.document_loaders import AssemblyAIAudioTranscriptLoader
audio_file = "https://storage.googleapis.com/aai-docs-samples/sports_injuries.mp3"
# or a local file path: audio_file = "./sports_injuries.mp3"
loader = AssemblyAIAudioTranscriptLoader(file_path=audio_file)
docs = loader.load()
What's going on behind the scenes?
This document loader transcribes the given audio file and loads the transcribed text into LangChain documents. If a local file is given, it also uploads the file first.
Note: Calling loader.load()
blocks until the transcription is finished.
After loading the data, the transcribed text is stored in the page_content
attribute:
print(docs[0].page_content)
# Runner's knee. Runner's knee is a condition ...
The metadata
contains the full JSON response of our API with more meta information:
print(docs[0].metadata)
# {'language_code': <LanguageCode.en_us: 'en_us'>,
# 'punctuate': True,
# 'format_text': True,
# ...
# }
Tip: The default configuration of the document loader returns a list with only one document, that's why here we access the first document in the list with docs[0]
. But you can use a different TranscriptFormat
that splits text for example by sentences or paragraphs, and returns multiple documents. You can read more about the TranscriptFormat options here.
from langchain.document_loaders.assemblyai import TranscriptFormat
loader = AssemblyAIAudioTranscriptLoader(
file_path="./your_file.mp3",
transcript_format=TranscriptFormat.SENTENCES,
)
docs = loader.load() # Now it returns a list with multiple documents
Apply a Question Answering Chain
Now that you have loaded the transcribed text into LangChain documents, you can easily ask questions about the spoken data. For example, you can apply a model from OpenAI with a QA chain.
For this, you also need to install the OpenAI Python package and set your OpenAI API key as environment variable:
pip install openai
# Mac/Linux:
export OPENAI_API_KEY=<YOUR_OPENAI_KEY>
# Windows:
set OPENAI_API_KEY=<YOUR_OPENAI_KEY>
Now, you can apply the load_qa_chain
and pass in the retrieved documents from the first step as input_documents
argument:
from langchain.llms import OpenAI
from langchain.chains.question_answering import load_qa_chain
llm = OpenAI()
qa_chain = load_qa_chain(llm, chain_type="stuff")
answer = qa_chain.run(input_documents=docs,
question="What is a runner's knee?")
print(answer)
# Runner's knee is a condition characterized by ...
Conclusion
This tutorial explained how to use the AssemblyAI integration that was added to the LangChain Python framework in version 0.0.272. You learned how to transcribe audio files and load the transcribed text into LangChain documents, and how to create a Q&A chain to ask questions about your spoken data.
Below is the complete code:
from langchain.document_loaders import AssemblyAIAudioTranscriptLoader
from langchain.llms import OpenAI
from langchain.chains.question_answering import load_qa_chain
file_path = "https://storage.googleapis.com/aai-docs-samples/sports_injuries.mp3"
loader = AssemblyAIAudioTranscriptLoader(file_path)
docs = loader.load()
llm = OpenAI()
qa_chain = load_qa_chain(llm, chain_type="stuff")
answer = qa_chain.run(input_documents=docs,
question="What is a runner's knee?")
print(answer)
If you enjoyed this article, feel free to check out some others on our blog, like
- How to integrate spoken audio into LangChain.js using AssemblyAI
- Automatic summarization with LLMs in Python
- Recent developments in Generative AI for Audio
Alternatively, check out our YouTube channel for learning resources on AI, like our Machine Learning from Scratch series.