Detect Low Confidence Words in a Transcript
In this guide, we’ll show you how to detect sentences that contain words with low confidence scores. Confidence scores represent how confident the model was in predicting the transcribed word. Detecting words with low confidence scores can be important for manually editing transcripts. Each transcribed word will contain a corresponding confidence score between 0.0 (low confidence) and 1.0 (high confidence). You can decide what your confidence threshold will be when implementing this logic in your application. For this guide, we will use a threshold of 0.4.
Getting Started
Before we begin, make sure you have an AssemblyAI account and an API key. You can sign up for an account and get your API key from your dashboard. This guide will use AssemblyAI’s node SDK. If you haven’t already, install the SDK by following these instructions.
Step-by-Step Instructions
Import the AssemblyAI package and create an AssemblyAI object with your API key:
Next create the transcript with your audio file, either via local audio file or URL (AssemblyAI’s servers need to be able to access the URL, make sure the URL links to a downloadable file).
From there use the id
from the transcript to request the transcript broken down into sentences.
Set the confidence score threshold to a value of you choice (0.5 or less is a good start). In this guide, we’ll use 0.4.
Next, we will filter the sentences array down to just sentences that contain words with confidence scores of under 0.4.
Next we’ll alter the filteredSentences
array so that the words
array for each sentence only contains the words with confidence scores under of 0.4.
Finally, we’ll display the final results. The final results will include the timestamp of the sentence that contains low confidence words, the sentence, the words that scored poorly, and their scores.
The output will look something like this: