November 5, 2024

Auto-generate subtitles with Python and AssemblyAI

Stop manually creating subtitles for your videos, and learn how to auto-generate them with Python and AssemblyAI in this tutorial.

Tutorial

Python

Marcus Olsson

Senior Developer Educator

Marcus Olsson

Senior Developer Educator

Table of contents

[Visible on live site]

Get $50 in credits

Providing subtitles is an excellent way for creators to make their videos accessible to a broader audience, such as non-native speakers and those who are hard of hearing. Unfortunately, creating subtitles yourself is both tedious and time-consuming. In this tutorial, you'll learn how to use AssemblyAI to transcribe videos and automatically generate accurate and high-quality subtitles.

Before you start

To finish this tutorial, you'll need:

Python 3.8 installed.
A free AssemblyAI account. Sign up if you haven't already.

Choosing a subtitle format

Two of the most commonly used file formats for subtitles are SRT and VTT. Both are plaintext formats that look almost identical when you compare them side-by-side in a text editor. Try to spot the differences in the examples below:

SRT:

1 00:00:00,280 --> 00:00:04,449 Assembly AI is building AI systems to help you build AI applications with 2 00:00:04,497 --> 00:00:08,105 spoken data. We create superhuman AI models for speech

VTT:

WEBVTT 00:00.280 --> 00:03.449 Assembly AI is building AI systems to help you build AI applications with 00:04.497 --> 00:08.105 spoken data. We create superhuman AI models for speech

You may notice a few differences, such as the WEBVTT header in the VTT examples or how timestamps are formatted. In fact, VTT is based on SRT, but also lets you customize the style of the subtitles, such as the font style, color, and positioning.

So which format is the right one for you?

If you'll mainly use the subtitles for traditional media players, like VLC, you may want to choose SRT.
If you want to use them in the web browser, you may instead want to go with VTT.

While you could create and edit either file formats manually, AssemblyAI lets you transcribe the video and automatically generate the timestamps in both SRT and VTT formats.

Set up the environment

Let's first set up the files and dependencies needed for the app.

First, open a terminal if you haven't already, and create a new directory for your project:

mkdir generate-subtitles cd generate-subtitles

Next, create and activate a Python virtual environment:

# macOS and Linux: python3 -m venv env source env/bin/activate # Windows: python -m venv env .\env\Scripts\activate

With your virtual environment active, install the required packages:

pip install assemblyai

Finally, configure the AssemblyAI API key. You can find your personal key on AssemblyAI dashboard, under Copy your API key.

# macOS/Linux: export ASSEMBLYAI_API_KEY=<YOUR_KEY> # Windows: set ASSEMBLYAI_API_KEY=<YOUR_KEY>

Transcribe the video file

Before we can generate subtitles, we'll need to transcribe the video into text. The transcript returned from AssemblyAI contains both the text as well as timestamps for when each word appears in the video.

In the project directory, create a file called generate_subtitles.py with the following code:

import assemblyai as aai VIDEO_URL = "https://storage.googleapis.com/aai-web-samples/aai-overview.mp4" transcriber = aai.Transcriber() transcript = transcriber.transcribe(VIDEO_URL)

VIDEO_URL is a URL to a publicly available video file. Feel free to use your own video, or the one in the example. AssemblyAI supports the most common audio and video types. For the complete list of all the supported file types, see the AssemblyAI FAQ.

You can also define local file paths. For example, if you already have the video file locally on your computer, you can instead write transcriber.transcribe("./video.mp4").

Generate subtitles

With the video transcribed, we'll use the export_subtitles_srt() and export_subtitles_vtt() methods on the transcript to return the corresponding format.

In generate_subtitles.py, add the following code:

srt = transcript.export_subtitles_srt() # Save it to a file with open("aai-overview.srt", "w") as f: f.write(srt)

Or, to generate subtitles in VTT format instead:

srt = transcript.export_subtitles_vtt() # Save it to a file with open("aai-overview.vtt", "w") as f: f.write(srt)

Finally, to run the app, run the Python app in your terminal:

python3 generate_subtitles.py

To further tweak the output, you can also customize the maximum characters per caption:

srt = transcript.export_subtitles_srt(chars_per_caption=32)

Complete source code

Here's the complete source code for this tutorial:

import assemblyai as aai VIDEO_URL = "https://storage.googleapis.com/aai-web-samples/aai-overview.mp4" transcriber = aai.Transcriber() transcript = transcriber.transcribe(VIDEO_URL) srt = transcript.export_subtitles_srt() # Save it to a file with open("aai-overview.srt", "w") as f: f.write(srt)

Learn more

In this tutorial, you learned how to generate high-quality subtitles for your videos. You also learned about the differences between SRT and VTT, two common file formats for subtitles, and when to pick one over the other. For more information on how to customize the transcript, see the docs for Speech Recognition.