Transcribe Multiple Files Simultaneously Using the Python SDK

In this guide, we’ll show you how to use the AssemblyAI API to transcribe multiple audio files at once. This guide focuses on demonstrating how to use the AssemblyAI Python SDK to acheive this.

You can also look at an alternative method to acheive this with Webhooks and integrating a server API here.

Quickstart

1import assemblyai as aai
2import threading
3import os
4
5aai.settings.api_key = "YOUR_API_KEY"
6batch_folder = "audio"
7transcription_result_folder = "transcripts"
8
9transcriber = aai.Transcriber()
10
11def transcribe_audio(audio_file):
12 transcriber = aai.Transcriber()
13 transcript = transcriber.transcribe(os.path.join(batch_folder, audio_file))
14 if transcript.status == "completed":
15 with open(f"{transcription_result_folder}/{audio_file}.txt", "w") as f:
16 f.write(transcript.text)
17 elif transcript.status == "error":
18 print("Error: ", transcript.error)
19
20threads = []
21for filename in os.listdir(batch_folder):
22 thread = threading.Thread(target=transcribe_audio, args=(filename,))
23 threads.append(thread)
24 thread.start()
25
26for thread in threads:
27 thread.join()
28
29print("All transcriptions are complete.")

Get Started

Before we begin, make sure you have an AssemblyAI account and an API key. You can sign up for a free account and get your API key from your dashboard.

Step-by-Step Guide

Install the SDK.

1pip install -U assemblyai

Import the assemblyai package and set the API key. Import threading and OS Python libraries that enable concurrent task processing and file path interactions respectively.

1import assemblyai as aai
2import threading
3import os
4
5aai.settings.api_key = "YOUR_API_KEY"

Set the folders. The batch folder contains the audio files that you want to process and transcribe. The transcription_result_folder stores the .txt transcript files.

1batch_folder = "audio"
2transcription_result_folder = "transcripts"

Create a Transcriber object.

1transcriber = aai.Transcriber()

Function to transcribe an audio file. Once the transcript is complete, a .txt file is generated to the transcription_result_folder. If there is an error with the transcription, it will not be processed to the results folder.

1def transcribe_audio(audio_file):
2 transcriber = aai.Transcriber()
3 transcript = transcriber.transcribe(os.path.join(batch_folder, audio_file))
4 if transcript.status == "completed":
5 with open(f"{transcription_result_folder}/{audio_file}.txt", "w") as f:
6 f.write(transcript.text)
7 elif transcript.status == "error":
8 print("Error: ", transcript.error)

Open threads to transcribe each file concurrently. Once all the threads are complete you will receive the “All transcriptions are complete” message in your terminal.

1threads = []
2for filename in os.listdir(batch_folder):
3 thread = threading.Thread(target=transcribe_audio, args=(filename,))
4 threads.append(thread)
5 thread.start()
6
7for thread in threads:
8 thread.join()
9
10print("All transcriptions are complete.")

Conclusion

This guide aims to demonstrate how to use AssemblyAI Python SDK to concurrently process multiple audio files at once. The output is transcript text files for each audio file in the specified folder.

Other integrations and features can be built on top of this main function. These include and are not limited to: exporting the file in different formats, adding Core Transcription or Audio Intelligence features.

If you have any questions, please feel free to reach out to our Support team - support@assemblyai.com or in our Community Discord!