Generate Transcript Citations using LeMUR
This guide will walk through the process of generating transcript citations using OpenAI embeddings and the LeMUR API.
Overview
Extracting exact quotes from transcripts can be a difficult task for Large Language Models, which makes it challenging to cite sources or identify timestamps for generative text.
Embeddings are powerful representations of text that capture its semantic and contextual meaning. By leveraging embeddings, we can transform raw text data, such as transcripts, into dense numerical vectors that encode its underlying information. These embeddings enable us to perform sophisticated tasks such as similarity comparison and contextual searching.
In this guide, we demonstrate how to utilize OpenAI embeddings to retrieve transcript citations to corroborate the results from the LeMUR API. LeMUR is proficient at providing the ‘what’ & ‘why’ and now embeddings will be able to provide the ‘where’ & ‘when’.
We’ll walk through 3 use cases for this including verification of sources for specific answers, timestamping of action items, and generation of customer quotes.
Get Started
Before we begin, make sure you have an AssemblyAI account and an API key. You can sign up for an account and get your API key from your dashboard. You will also need an OpenAI API token.
LeMUR features are currently only available to paid users. See pricing for more details.
Instructions
Install the libraries required for the transcription and embedding creation.
Submitting a File for Transcription
Create Transcript Embeddings
We are using the text-embedding-ada-002 model to generate our embeddings.
The pricing for this model is $0.0001 / 1k tokens which equates to roughly 0.0015 to embed one hour of audio.
Examples
Cite Answers to Specific Questions
Cite your sources to specific answers returned from the LeMUR Q&A API.
Example output:
Provide References to Multiple Transcripts
When analyzing multiple transcripts, it can be helpful to have references to know which transcript the answer came from.
Example output:
Identify Timestamps For Action Items
Quickly jump to the part of the meeting where the action item was discussed.
Example output: