Auto-generate subtitles with Python and AssemblyAI
Stop manually creating subtitles for your videos, and learn how to auto-generate them with Python and AssemblyAI in this tutorial.



Providing subtitles is an excellent way for creators to make their videos accessible to a broader audience, such as non-native speakers and those who are hard of hearing. Unfortunately, creating subtitles yourself is both tedious and time-consuming. In this tutorial, you'll learn how to use AssemblyAI to transcribe videos and automatically generate accurate and high-quality subtitles.
Before you start
To finish this tutorial, you'll need:
- Python 3.8 installed.
- A free AssemblyAI account. Sign up if you haven't already.
Choosing a subtitle format
Two of the most commonly used file formats for subtitles are SRT and VTT. Both are plaintext formats that look almost identical when you compare them side-by-side in a text editor. Try to spot the differences in the examples below:
SRT:
1 00:00:00,280 --> 00:00:04,449 Assembly AI is building AI systems to help you build AI applications with 2 00:00:04,497 --> 00:00:08,105 spoken data. We create superhuman AI models for speech
VTT:
WEBVTT 00:00.280 --> 00:03.449 Assembly AI is building AI systems to help you build AI applications with 00:04.497 --> 00:08.105 spoken data. We create superhuman AI models for speech
You may notice a few differences, such as the WEBVTT
header in the VTT examples or how timestamps are formatted. In fact, VTT is based on SRT, but also lets you customize the style of the subtitles, such as the font style, color, and positioning.
So which format is the right one for you?
- If you'll mainly use the subtitles for traditional media players, like VLC, you may want to choose SRT.
- If you want to use them in the web browser, you may instead want to go with VTT.
While you could create and edit either file formats manually, AssemblyAI lets you transcribe the video and automatically generate the timestamps in both SRT and VTT formats.
Set up the environment
Let's first set up the files and dependencies needed for the app.
First, open a terminal if you haven't already, and create a new directory for your project:
mkdir generate-subtitles cd generate-subtitles
Next, create and activate a Python virtual environment:
# macOS and Linux: python3 -m venv env source env/bin/activate # Windows: python -m venv env .\env\Scripts\activate
With your virtual environment active, install the required packages:
pip install assemblyai
Finally, configure the AssemblyAI API key. You can find your personal key on AssemblyAI dashboard, under Copy your API key.
# macOS/Linux: export ASSEMBLYAI_API_KEY=<YOUR_KEY> # Windows: set ASSEMBLYAI_API_KEY=<YOUR_KEY>
Transcribe the video file
Before we can generate subtitles, we'll need to transcribe the video into text. The transcript returned from AssemblyAI contains both the text as well as timestamps for when each word appears in the video.
In the project directory, create a file called generate_subtitles.py
with the following code:
import assemblyai as aai VIDEO_URL = "https://storage.googleapis.com/aai-web-samples/aai-overview.mp4" transcriber = aai.Transcriber() transcript = transcriber.transcribe(VIDEO_URL)
VIDEO_URL
is a URL to a publicly available video file. Feel free to use your own video, or the one in the example. AssemblyAI supports the most common audio and video types. For the complete list of all the supported file types, see the AssemblyAI FAQ.
You can also define local file paths. For example, if you already have the video file locally on your computer, you can instead write transcriber.transcribe("./video.mp4")
.
Generate subtitles
With the video transcribed, we'll use the export_subtitles_srt()
and export_subtitles_vtt()
methods on the transcript to return the corresponding format.
In generate_subtitles.py
, add the following code:
srt = transcript.export_subtitles_srt() # Save it to a file with open("aai-overview.srt", "w") as f: f.write(srt)
Or, to generate subtitles in VTT format instead:
srt = transcript.export_subtitles_vtt() # Save it to a file with open("aai-overview.vtt", "w") as f: f.write(srt)
Finally, to run the app, run the Python app in your terminal:
python3 generate_subtitles.py
To further tweak the output, you can also customize the maximum characters per caption:
srt = transcript.export_subtitles_srt(chars_per_caption=32)
Complete source code
Here's the complete source code for this tutorial:
import assemblyai as aai VIDEO_URL = "https://storage.googleapis.com/aai-web-samples/aai-overview.mp4" transcriber = aai.Transcriber() transcript = transcriber.transcribe(VIDEO_URL) srt = transcript.export_subtitles_srt() # Save it to a file with open("aai-overview.srt", "w") as f: f.write(srt)
Learn more
In this tutorial, you learned how to generate high-quality subtitles for your videos. You also learned about the differences between SRT and VTT, two common file formats for subtitles, and when to pick one over the other. For more information on how to customize the transcript, see the docs for Speech Recognition.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.