Tutorial

Auto-generate subtitles with Python and AssemblyAI

Stop manually creating subtitles for your videos, and learn how to auto-generate them with Python and AssemblyAI in this tutorial.

Auto-generate subtitles with Python and AssemblyAI

Providing subtitles is an excellent way for creators to make their videos accessible to a broader audience, such as non-native speakers and those who are hard of hearing. Unfortunately, creating subtitles yourself is both tedious and time-consuming. In this tutorial, you'll learn how to use AssemblyAI to transcribe videos and automatically generate accurate and high-quality subtitles.

Before you start

To finish this tutorial, you'll need:

  • Python 3.8 installed.
  • A free AssemblyAI account. Sign up if you haven't already.

Choosing a subtitle format

Two of the most commonly used file formats for subtitles are SRT and VTT. Both are plaintext formats that look almost identical when you compare them side-by-side in a text editor. Try to spot the differences in the examples below:

SRT:

1
00:00:00,280 --> 00:00:04,449
Assembly AI is building AI systems to help you build AI applications with

2
00:00:04,497 --> 00:00:08,105
spoken data. We create superhuman AI models for speech

VTT:

WEBVTT

00:00.280 --> 00:03.449
Assembly AI is building AI systems to help you build AI applications with

00:04.497 --> 00:08.105
spoken data. We create superhuman AI models for speech

You may notice a few differences, such as the WEBVTT header in the VTT examples or how timestamps are formatted. In fact, VTT is based on SRT, but also lets you customize the style of the subtitles, such as the font style, color, and positioning.

So which format is the right one for you?

  • If you'll mainly use the subtitles for traditional media players, like VLC, you may want to choose SRT.
  • If you want to use them in the web browser, you may instead want to go with VTT.

While you could create and edit either file formats manually, AssemblyAI lets you transcribe the video and automatically generate the timestamps in both SRT and VTT formats.

Set up the environment

Let's first set up the files and dependencies needed for the app.

First, open a terminal if you haven't already, and create a new directory for your project:

mkdir generate-subtitles
cd generate-subtitles

Next, create and activate a Python virtual environment:

# macOS and Linux:
python3 -m venv env
source env/bin/activate

# Windows:
python -m venv env
.\env\Scripts\activate

With your virtual environment active, install the required packages:

pip install assemblyai

Finally, configure the AssemblyAI API key. You can find your personal key on AssemblyAI dashboard, under Copy your API key.

# macOS/Linux:
export ASSEMBLYAI_API_KEY=<YOUR_KEY>

# Windows:
set ASSEMBLYAI_API_KEY=<YOUR_KEY>

Transcribe the video file

Before we can generate subtitles, we'll need to transcribe the video into text. The transcript returned from AssemblyAI contains both the text as well as timestamps for when each word appears in the video.

In the project directory, create a file called generate_subtitles.py with the following code:

import assemblyai as aai

VIDEO_URL = "https://storage.googleapis.com/aai-web-samples/aai-overview.mp4"

transcriber = aai.Transcriber()
transcript = transcriber.transcribe(VIDEO_URL)

VIDEO_URL is a URL to a publicly available video file. Feel free to use your own video, or the one in the example. AssemblyAI supports the most common audio and video types. For the complete list of all the supported file types, see the AssemblyAI FAQ.

You can also define local file paths. For example, if you already have the video file locally on your computer, you can instead write transcriber.transcribe("./video.mp4").

Generate subtitles

With the video transcribed, we'll use the export_subtitles_srt() and export_subtitles_vtt() methods on the transcript to return the corresponding format.

In generate_subtitles.py, add the following code:

srt = transcript.export_subtitles_srt()

# Save it to a file
with open("aai-overview.srt", "w") as f:
    f.write(srt)

Or, to generate subtitles in VTT format instead:

srt = transcript.export_subtitles_vtt()

# Save it to a file
with open("aai-overview.vtt", "w") as f:
    f.write(srt)

Finally, to run the app, run the Python app in your terminal:

python3 generate_subtitles.py

To further tweak the output, you can also customize the maximum characters per caption:

srt = transcript.export_subtitles_srt(chars_per_caption=32)

Complete source code

Here's the complete source code for this tutorial:

import assemblyai as aai

VIDEO_URL = "https://storage.googleapis.com/aai-web-samples/aai-overview.mp4"

transcriber = aai.Transcriber()
transcript = transcriber.transcribe(VIDEO_URL)

srt = transcript.export_subtitles_srt()

# Save it to a file
with open("aai-overview.srt", "w") as f:
    f.write(srt)

Learn more

In this tutorial, you learned how to generate high-quality subtitles for your videos. You also learned about the differences between SRT and VTT, two common file formats for subtitles, and when to pick one over the other. For more information on how to customize the transcript, see the docs for Speech Recognition.