Custom vocabulary

Looking to improve transcription accuracy?

For significantly better results, try our new approach using LeMUR Custom Vocabulary. This advanced method provides more precise recognition for domain-specific terms and proper nouns without the formatting limitations of traditional word boosting.

To improve the transcription accuracy, you can boost certain words or phrases that appear frequently in your audio file.

To boost words or phrases, include the word_boost parameter in the transcription config.

You can also control how much weight to apply to each keyword or phrase. Include boost_param in the transcription config with a value of low, default, or high.

1import assemblyai as aai
2
3aai.settings.api_key = "<YOUR_API_KEY>"
4
5# audio_file = "./local_file.mp3"
6audio_file = "https://assembly.ai/wildfires.mp3"
7
8config = aai.TranscriptionConfig(
9 word_boost=["aws", "azure", "google cloud"],
10 boost_param="high"
11)
12
13transcript = aai.Transcriber(config=config).transcribe(audio_file)
14
15if transcript.status == "error":
16 raise RuntimeError(f"Transcription failed: {transcript.error}")
17
18print(transcript.text)

Follow formatting guidelines for custom vocabulary to ensure the best results:

  • Remove all punctuation except apostrophes.
  • Make sure each word is in its spoken form. For example, iphone seven instead of iphone 7.
  • Remove spaces between letters in acronyms.

Additionally, the model still accepts words with unique characters such as é, but converts them to their ASCII equivalent.

You can boost a maximum of 1,000 unique keywords and phrases, where each of them can contain up to 6 words.