For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
PlaygroundChangelogSign In
OverviewAPI ReferencePre-recorded STTStreaming STTVoice AgentsSpeech UnderstandingGuardrailsLLM GatewayFAQ
OverviewAPI ReferencePre-recorded STTStreaming STTVoice AgentsSpeech UnderstandingGuardrailsLLM GatewayFAQ
  • Getting started
    • Transcribe a pre-recorded audio file
    • Model selection
    • View model benchmarks
    • Evaluate model accuracy
    • Cloud endpoints & data residency
    • Manage concurrent requests
    • Webhooks
  • Models
    • Medical Mode
  • Features
    • Boost specific terms
    • Label speakers
    • Transcribe multiple audio channels
    • Transcribe audio with mixed languages
    • Correct spelling of terms
    • Include filler words
    • Search for words in transcript
    • Set the start and end of the transcript
  • Guides
      • Build a meeting notetaker
      • Build a medical scribe
      • Build a contact center application
        • Create Custom Length Subtitles
        • Create Subtitles with Speaker Labels
        • Generate Subtitles for Videos
        • Translate an AssemblyAI Subtitle Transcript
LogoLogo
PlaygroundChangelogSign In
On this page
  • Quickstart
  • Step-by-Step Instructions
GuidesTutorialsSubtitles

Create Custom Length Subtitles

Was this page helpful?
Previous

Create Subtitles with Speaker Labels

Next
Built with

While our SRT/VTT endpoints do allow you to customize the maximum number of characters per caption using the chars_per_caption URL parameter in your API requests, there are some use-cases that require a custom number of words in each subtitle.

In this guide, we will demonstrate how to construct these subtitles yourself in Python!

Quickstart

1import assemblyai as aai
2
3aai.settings.api_key = "YOUR-API-KEY"
4
5config = aai.TranscriptionConfig(speech_models=["universal-3-pro", "universal-2"])
6transcriber = aai.Transcriber()
7
8transcript = transcriber.transcribe("./my-audio.mp3", config)
9
10def second_to_timecode(x: float) -> str:
11 hour, x = divmod(x, 3600)
12 minute, x = divmod(x, 60)
13 second, x = divmod(x, 1)
14 millisecond = int(x * 1000.)
15
16 return '%.2d:%.2d:%.2d,%.3d' % (hour, minute, second, millisecond)
17
18def generate_subtitles_by_word_count(transcript, words_per_line):
19 output = []
20 subtitle_index = 1 # Start subtitle index at 1
21 word_count = 0
22 current_words = []
23
24 for sentence in transcript.get_sentences():
25 for word in sentence.words:
26 current_words.append(word)
27 word_count += 1
28 if word_count >= words_per_line or word == sentence.words[-1]:
29 start_time = second_to_timecode(current_words[0].start / 1000)
30 end_time = second_to_timecode(current_words[-1].end / 1000)
31 subtitle_text = " ".join([word.text for word in current_words])
32 output.append(str(subtitle_index))
33 output.append("%s --> %s" % (start_time, end_time))
34 output.append(subtitle_text)
35 output.append("")
36 current_words = [] # Reset for the next subtitle
37 word_count = 0 # Reset word count
38 subtitle_index += 1
39
40 return output
41
42subs = generate_subtitles_by_word_count(transcript, 6)
43with open(f"{transcript.id}.srt", 'w') as o:
44 final = '\n'.join(subs)
45 o.write(final)
46
47print("SRT file generated.")

Step-by-Step Instructions

$pip install -U assemblyai

Create a main.py file and import the assemblyai package and set the API key.

1import assemblyai as aai
2
3aai.settings.api_key = "YOUR-API-KEY"

Create a Transcriber object.

1config = aai.TranscriptionConfig(speech_models=["universal-3-pro", "universal-2"])
2transcriber = aai.Transcriber()

Use the Transcriber object’s transcribe method and pass in the audio file’s path as a parameter. The transcribe method saves the results of the transcription to the Transcriber object’s transcript attribute.

1transcript = transcriber.transcribe("./my-audio.mp3", config)

Alternatively, you can pass in the URL of the publicly accessible audio file on the internet.

1transcript = transcriber.transcribe("https://storage.googleapis.com/aai-docs-samples/espn.m4a", config)

Define a function that converts seconds to timecodes

1def second_to_timecode(x: float) -> str:
2 hour, x = divmod(x, 3600)
3 minute, x = divmod(x, 60)
4 second, x = divmod(x, 1)
5 millisecond = int(x * 1000.)
6
7 return '%.2d:%.2d:%.2d,%.3d' % (hour, minute, second, millisecond)

Define a function that iterates through the transcripts object to construct a list according to the number of words per subtitle

1def generate_subtitles_by_word_count(transcript, words_per_line):
2 output = []
3 subtitle_index = 1 # Start subtitle index at 1
4 word_count = 0
5 current_words = []
6
7 for sentence in transcript.get_sentences():
8 for word in sentence.words:
9 current_words.append(word)
10 word_count += 1
11 if word_count >= words_per_line or word == sentence.words[-1]:
12 start_time = second_to_timecode(current_words[0].start / 1000)
13 end_time = second_to_timecode(current_words[-1].end / 1000)
14 subtitle_text = " ".join([word.text for word in current_words])
15 output.append(str(subtitle_index))
16 output.append("%s --> %s" % (start_time, end_time))
17 output.append(subtitle_text)
18 output.append("")
19 current_words = [] # Reset for the next subtitle
20 word_count = 0 # Reset word count
21 subtitle_index += 1
22
23 return output

Generate your subtitle file

1subs = generate_subtitles_by_word_count(transcript, 6)
2with open(f"{transcript.id}.srt", 'w') as o:
3 final = '\n'.join(subs)
4 o.write(final)
5
6print("SRT file generated.")

Run your script.

$python main.py