Correct Audio Duration Discrepancies with Multi-Tool Validation and Transcoding
In this guide, you’ll learn how to check the audio duration of a file using three different tools: ffprobe
, SoX
, and MediaInfo
. This guide was created in response to customer feedback about transcription results showing incorrect audio durations. The issue was traced to audio files with corrupted metadata or problematic headers, leading to inaccurate duration data. If these tools report differing durations for the same file, transcription inconsistencies can arise. We will programmatically detect any duration mismatches and transcode the file to resolve them, typically resulting in a more accurate transcription.
Quickstart
Get Started
Before we begin, make sure you have an AssemblyAI account and an API key. You can sign up for an AssemblyAI account and get your API key from your dashboard.
Step-by-Step Instructions
Install the SDK:
Import the assemblyai
package along with subprocess
, set your AssemblyAI API key, and initiate the transcriber.
For this cookbook you will need ffmpeg
, sox
, and MediaInfo
. We will use these tools to pull the duration from the audio. Matching audio duration is crucial because discrepancies may indicate issues with the audio file’s metadata or headers. Such inconsistencies can lead to inaccurate transcription results, playback issues, or unexpected behaviour in media applications. By verifying that the duration is consistent across all three tools, we can detect potential problems early and correct any corrupted metadata or faulty headers before processing the audio further.
First, we will get the audio duration using ffprobe
.
Next, we will get the audio duration for the same file using sox
.
Finally, we will get the audio duration for the same file using MediaInfo
.
The following function will return the durations from the three tools and convert them to the same format.
Define the transcribe
function. This will run only when the duration is consistent among the three tools.
Define the transcode
function. We will run this if one or more durations differ. The output file will be a 16kHz WAV file as that is the format AssemblyAI models are trained on. When running the ffmpeg
command, the transcode may fail or return warnings if there are issues with the input file’s format, corrupted metadata, or unsupported codecs. These warnings tend to be verbose but you can print them for troubleshooting.
Define a function that will check if the durations are consistent. There may be small differences so it’s best to allow a small tolerance. In this example the tolerance value will be 0.01 seconds.
Finally, here is the order of operations for this program. This program will first check the duration of an audio file across different tools to ensure consistency. If any tool fails to retrieve a duration or if the durations differ, it transcodes the audio to a new 16kHz WAV file and checks the duration of the WAV file. If the durations are consistent in the transcoded file, the program proceeds to transcribe it. If inconsistencies remain after transcoding, it logs a warning to highlight the issue and will not transcribe the file.
If you continue to experience unexpected behaviour with your file, please contact our support team at support@assemblyai.com for assistance in diagnosing the issue.