Create Custom Length Subtitles | AssemblyAI

While our SRT/VTT endpoints do allow you to customize the maximum number of characters per caption using the chars_per_caption URL parameter in your API requests, there are some use-cases that require a custom number of words in each subtitle.

In this guide, we will demonstrate how to construct these subtitles yourself in Python!

Quickstart

1 import assemblyai as aai
2 
3 aai.settings.api_key = "YOUR-API-KEY"
4 
5 transcriber = aai.Transcriber()
6 
7 transcript = transcriber.transcribe("./my-audio.mp3")
8 
9 def second_to_timecode(x: float) -> str:
10     hour, x = divmod(x, 3600)
11     minute, x = divmod(x, 60)
12     second, x = divmod(x, 1)
13     millisecond = int(x * 1000.)
14 
15     return '%.2d:%.2d:%.2d,%.3d' % (hour, minute, second, millisecond)
16 
17 def generate_subtitles_by_word_count(transcript, words_per_line):
18   output = []
19   subtitle_index = 1  # Start subtitle index at 1
20   word_count = 0
21   current_words = []
22 
23   for sentence in transcript.get_sentences():
24     for word in sentence.words:
25       current_words.append(word)
26       word_count += 1
27       if word_count >= words_per_line or word == sentence.words[-1]:
28         start_time = second_to_timecode(current_words[0].start / 1000)
29         end_time = second_to_timecode(current_words[-1].end / 1000)
30         subtitle_text = " ".join([word.text for word in current_words])
31         output.append(str(subtitle_index))
32         output.append("%s --> %s" % (start_time, end_time))
33         output.append(subtitle_text)
34         output.append("")
35         current_words = []  # Reset for the next subtitle
36         word_count = 0  # Reset word count
37         subtitle_index += 1
38 
39   return output
40 
41 subs = generate_subtitles_by_word_count(transcript, 6)
42 with open(f"{transcript.id}.srt", 'w') as o:
43     final = '\n'.join(subs)
44     o.write(final)
45 
46 print("SRT file generated.")

Step-by-Step Instructions

1 pip install -U assemblyai

Import the assemblyai package and set the API key.

1 import assemblyai as aai
2 
3 aai.settings.api_key = "YOUR-API-KEY"

Create a Transcriber object.

1 transcriber = aai.Transcriber()

Use the Transcriber object’s transcribe method and pass in the audio file’s path as a parameter. The transcribe method saves the results of the transcription to the Transcriber object’s transcript attribute.

1 transcript = transcriber.transcribe("./my-audio.mp3")

Alternatively, you can pass in the URL of the publicly accessible audio file on the internet.

1 transcript = transcriber.transcribe("https://storage.googleapis.com/aai-docs-samples/espn.m4a")

Define a function that converts seconds to timecodes

1 def second_to_timecode(x: float) -> str:
2     hour, x = divmod(x, 3600)
3     minute, x = divmod(x, 60)
4     second, x = divmod(x, 1)
5     millisecond = int(x * 1000.)
6 
7     return '%.2d:%.2d:%.2d,%.3d' % (hour, minute, second, millisecond)

Define a function that iterates through the transcripts object to construct a list according to the number of words per subtitle

1 def generate_subtitles_by_word_count(transcript, words_per_line):
2   output = []
3   subtitle_index = 1  # Start subtitle index at 1
4   word_count = 0
5   current_words = []
6 
7   for sentence in transcript.get_sentences():
8     for word in sentence.words:
9       current_words.append(word)
10       word_count += 1
11       if word_count >= words_per_line or word == sentence.words[-1]:
12         start_time = second_to_timecode(current_words[0].start / 1000)
13         end_time = second_to_timecode(current_words[-1].end / 1000)
14         subtitle_text = " ".join([word.text for word in current_words])
15         output.append(str(subtitle_index))
16         output.append("%s --> %s" % (start_time, end_time))
17         output.append(subtitle_text)
18         output.append("")
19         current_words = []  # Reset for the next subtitle
20         word_count = 0  # Reset word count
21         subtitle_index += 1
22 
23   return output

Generate your subtitle file

1 subs = generate_subtitles_by_word_count(transcript, 6)
2 with open(f"{transcript.id}.srt", 'w') as o:
3     final = '\n'.join(subs)
4     o.write(final)
5 
6 print("SRT file generated.")