🦜️🔗 LangChain Python Integration with AssemblyAI
To apply LLMs to speech, you first need to transcribe the audio to text, which is what the AssemblyAI integration for LangChain helps you with.
Looking for the LangChain JavaScript/TypeScript integration?
Go to the LangChain.JS integration.
Quickstart
Install the AssemblyAI package and the AssemblyAI Python SDK:
pip install langchain
pip install assemblyai
Set your AssemblyAI API key as an environment variable named ASSEMBLYAI_API_KEY
. You can get a free AssemblyAI API key from the AssemblyAI dashboard.
# Mac/Linux:
export ASSEMBLYAI_API_KEY=YOUR_API_KEY
# Windows:
set ASSEMBLYAI_API_KEY=YOUR_API_KEY
Import the AssemblyAIAudioTranscriptLoader
from langchain.document_loaders
.
from langchain.document_loaders import AssemblyAIAudioTranscriptLoader
- Pass the local file path or URL as the
file_path
argument of theAssemblyAIAudioTranscriptLoader
. - Call the
load
method to get the transcript as LangChain documents.
audio_file = "https://assembly.ai/sports_injuries.mp3"
# or a local file path: audio_file = "./sports_injuries.mp3"
loader = AssemblyAIAudioTranscriptLoader(file_path=audio_file)
docs = loader.load()
The load
method returns an array of documents, but by default, there's only one document in the array with the full transcript.
The transcribed text is available in the page_content
attribute:
docs[0].page_content
# Load time, a new president and new congressional makeup. Same old ...
The metadata
contains the full JSON response with more meta information:
{
'language_code': <LanguageCode.en_us: 'en_us'>,
'audio_url': 'https://assembly.ai/nbc.mp3',
'punctuate': True,
'format_text': True,
...
}
Transcript formats
You can specify the transcript_format
argument to load the transcript in different formats.
Depending on the format, load_data()
returns either one or more documents. These are the different TranscriptFormat
options:
TEXT
: One document with the transcription textSENTENCES
: Multiple documents, splits the transcription by each sentencePARAGRAPHS
: Multiple documents, splits the transcription by each paragraphSUBTITLES_SRT
: One document with the transcript exported in SRT subtitles formatSUBTITLES_VTT
: One document with the transcript exported in VTT subtitles format
from langchain.document_loaders.assemblyai import TranscriptFormat
loader = AssemblyAIAudioTranscriptLoader(
file_path="./your_file.mp3",
transcript_format=TranscriptFormat.SENTENCES,
)
docs = loader.load()
Transcription config
You can also specify the config
argument to use different transcript features and audio intelligence models.
Here's an example of using the config
argument to enable speaker labels, auto chapters, and entity detection:
import assemblyai as aai
config = aai.TranscriptionConfig(
speaker_labels=True, auto_chapters=True, entity_detection=True
)
loader = AssemblyAIAudioTranscriptLoader(file_path="./your_file.mp3", config=config)
For the full list of options, see Transcript API reference.
Pass the AssemblyAI API key as an argument
Instead of configuring the AssemblyAI API key as the ASSEMBLYAI_API_KEY
environment variable,
you can also pass it as the api_key
argument.
loader = AssemblyAIAudioTranscriptLoader(
file_path="./your_file.mp3", api_key="YOUR_API_KEY"
)
Additional resources
You can learn more about using LangChain with AssemblyAI in these resources.