WebVTT .vtt
or Web Video Text Tracks Format is a widely used and supported format for subtitles in videos. This is what the first lines of the WebVTT file for this YouTube video look like:
WEBVTT
00:00.170 --> 00:04.234
AssemblyAI is building AI systems to help you build AI applications
00:04.282 --> 00:08.106
with spoken data. We create superhuman AI models for speech
In this guide, you'll learn how to create WebVTT files for videos using Node.js and the AssemblyAI API.
Step 1: Set up your development environment
First, install Node.js 18 or higher on your system.
Next, create a new project folder, change directories to it, and initialize a new Node.js project:
mkdir vtt-subtitles
cd vtt-subtitles
npm init -y
Open the package.json file and add type: "module",
to the list of properties.
{
...
"type": "module",
...
}
This will tell Node.js to use the ES Module syntax for exporting and importing modules, and not to use the old CommonJS syntax.
Then, install the AssemblyAI JavaScript SDK which makes it easier to interact with the AssemblyAI API:
npm install --save assemblyai
Next, you need an AssemblyAI API key that you can find on your dashboard. If you don't have an AssemblyAI account, first sign up for free. Once you’ve copied your API key, configure it as the ASSEMBLYAI_API_KEY
environment variable on your machine:
# Mac/Linux:
export ASSEMBLYAI_API_KEY=<YOUR_KEY>
# Windows:
set ASSEMBLYAI_API_KEY=<YOUR_KEY>
2. Transcribe your video
Now that your development environment is ready, you can start transcribing your video files. In this tutorial, you'll use this video in MP4 format. The AssemblyAI SDK can transcribe any audio or video file that’s publicly accessible via a URL, but you can also specify local files. Create a file called index.js
and add the following code:
import { AssemblyAI } from 'assemblyai';
// create AssemblyAI API client
const client = new AssemblyAI({ apiKey: process.env.ASSEMBLYAI_API_KEY });
// transcribe audio or video file
const transcript = await client.transcripts.transcribe({
audio: "https://storage.googleapis.com/aai-web-samples/aai-overview.mp4",
});
If the transcription is successful, the transcript
object will be populated with the transcript text and many additional properties. However, you should verify whether an error occurred and log the error.
Add the following lines of JavaScript:
// throw error if transcript status is error
if (transcript.status === "error") {
throw new Error(transcript.error);
}
3. Generate WebVTT file
Now that you have a transcript, you can generate the subtitles in WebVTT format.
Add the following import which you'll need to save the WebVTT file to disk.
import { writeFile } from "fs/promises"
Then add the following code to generate the WebVTT subtitles from the transcript and download the VTT file to disk.
// generate WebVTT subtitles
const vtt = await client.transcripts.subtitles(transcript.id, "vtt");
await writeFile("./subtitles.vtt", vtt);
You can customize the maximum number of characters per caption by specifying the third parameter (chars_per_caption
).
// generate WebVTT subtitles
const vtt = await client.transcripts.subtitles(transcript.id, "vtt", 32);
await writeFile("./subtitles.vtt", vtt);
WebVTT Subtitle Format
SRT is another widely supported and popular subtitle format. To generate SRT, replace `"vtt"` with `"srt"`, and save the file with the srt-extension.
4. Run the script
To run the script, go back to your shell and run:
node index.js
After a couple of seconds, you'll see a new file on disk subtitles.vtt
, which looks like this:
WEBVTT
00:00.200 --> 00:04.430
AssemblyAI is building AI systems to help you build AI applications with
00:04.462 --> 00:08.694
spoken data. We create superhuman AI models for speech recognition,
00:08.774 --> 00:13.062
summarization, knowledge, augmentation of large language models with spoken
Next steps
Now that you have your subtitle file, you can configure it in your video player, or if you're creating a YouTube video, upload it to YouTube Studio. You can also use other tools to bundle or even burn the subtitles into your video.
Check out our Audio Intelligence models and LeMUR to add even more capabilities to your audio and video applications.
Alternatively, feel free to check out our blog or YouTube channel for educational content on AI and Machine Learning, or feel free to join us on Twitter or Discord to stay in the loop when we release new content.