Improving Transcript Accuracy

For optimal transcription accuracy, we recommend using our slam-1 model, which offers superior performance and fine-tuning capabilities. Here’s how to get the best results:

Fine-tuning with keyterms_prompt

Improve transcription accuracy by leveraging Slam-1’s contextual understanding capabilities by prompting the model with certain words or phrases that are likely to appear frequently in your audio file.

Rather than simply increasing the likelihood of detecting specific words, Slam-1’s multi-modal architecture actually understands the semantic meaning and context of the terminology you provide, enhancing transcription quality not just of the exact terms you specify, but also related terminology, variations, and contextually similar phrases.

Provide up to 1000 domain-specific words or phrases (maximum 6 words per phrase) that may appear in your audio using the optional keyterms_prompt parameter:

This parameter is only supported when the speech_model is set to "slam-1".

1import requests
2import time
3
4base_url = "https://api.assemblyai.com"
5headers = {"authorization": "<YOUR_API_KEY>"}
6
7data = {
8 "audio_url": "https://assembly.ai/sports_injuries.mp3",
9 "speech_model": "slam-1",
10 "keyterms_prompt": ['differential diagnosis', 'hypertension', 'Wellbutrin XL 150mg']
11}
12
13response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)
14
15if response.status_code != 200:
16 print(f"Error: {response.status_code}, Response: {response.text}")
17 response.raise_for_status()
18
19transcript_response = response.json()
20transcript_id = transcript_response["id"]
21polling_endpoint = f"{base_url}/v2/transcript/{transcript_id}"
22
23while True:
24 transcript = requests.get(polling_endpoint, headers=headers).json()
25 if transcript["status"] == "completed":
26 print(transcript["text"])
27 break
28 elif transcript["status"] == "error":
29 raise RuntimeError(f"Transcription failed: {transcript['error']}")
30 else:
31 time.sleep(3)
Keyword count limits

While we support up to 1000 key words and phrases, actual capacity may be lower due to internal tokenization and implementation constraints. Key points to remember:

  • Each word in a multi-word phrase counts towards the 1000 keyword limit
  • Capitalization affects capacity (uppercase tokens consume more than lowercase)
  • Longer words consume more capacity than shorter words

For optimal results, use shorter phrases when possible and be mindful of your total token count when approaching the keyword limit.

Using Universal or Nano Models

If you’re currently using our universal or nano models and experiencing accuracy issues:

  1. Consider upgrading to Slam-1: This is the recommended solution for better accuracy, especially with domain-specific content.

  2. Alternative approach (if Slam-1 isn’t an option): If you must continue using universal or nano models, you can try the LeMUR Custom Vocabulary approach as a workaround, though this may not provide the same level of accuracy as Slam-1 with keyterms_prompt.

    LeMUR Custom Vocabulary

    Learn more about using LeMUR Custom Vocabulary if you need to improve accuracy while using universal or nano models.