Getting started

Introducing Slam-1

Learn how to transcribe pre-recorded audio using Slam-1.

Overview

Slam-1 is our new Speech Language Model that combines LLM architecture with ASR encoders for superior speech-to-text transcription. This model delivers unprecedented accuracy through its understanding of context and semantic meaning. Check out our Slam-1 blog post to learn more about this new model!

Slam-1 is currently only supported for English.

Quick Start

Slam-1 is available in beta through our standard API endpoint. To use it:

  1. Make requests to https://api.assemblyai.com/v2/transcript with your API key
  2. Add the speech_model parameter with value “slam-1”
1import requests
2import time
3
4base_url = "https://api.assemblyai.com"
5headers = {"authorization": "<YOUR_API_KEY>"}
6
7data = {
8 "audio_url": "https://assembly.ai/sports_injuries.mp3",
9 "speech_model": "slam-1"
10}
11
12response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)
13
14if response.status_code != 200:
15 print(f"Error: {response.status_code}, Response: {response.text}")
16 response.raise_for_status()
17
18transcript_response = response.json()
19transcript_id = transcript_response["id"]
20polling_endpoint = f"{base_url}/v2/transcript/{transcript_id}"
21
22while True:
23 transcript = requests.get(polling_endpoint, headers=headers).json()
24 if transcript["status"] == "completed":
25 print(transcript["text"])
26 break
27 elif transcript["status"] == "error":
28 raise RuntimeError(f"Transcription failed: {transcript['error']}")
29 else:
30 time.sleep(3)
Local audio files

The above code example shows how to transcribe a file that is available via URL. If you would like to work with local files, see our API Reference for more information on transcribing local files.

Fine-tuning Slam-1

Improve transcription accuracy by leveraging Slam-1’s contextual understanding capabilities by prompting the model with certain words or phrases that are likely to appear frequently in your audio file.

Rather than simply increasing the likelihood of detecting specific words, Slam-1’s multi-modal architecture actually understands the semantic meaning and context of the terminology you provide, enhancing transcription quality not just of the exact terms you specify, but also related terminology, variations, and contextually similar phrases.

Provide up to 1000 domain-specific words or phrases (maximum 6 words per phrase) that may appear in your audio using the optional keyterms_prompt parameter:

1import requests
2import time
3
4base_url = "https://api.assemblyai.com"
5headers = {"authorization": "<YOUR_API_KEY>"}
6
7data = {
8 "audio_url": "https://assembly.ai/sports_injuries.mp3",
9 "speech_model": "slam-1",
10 "keyterms_prompt": ['differential diagnosis', 'hypertension', 'Wellbutrin XL 150mg']
11}
12
13response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)
14
15if response.status_code != 200:
16 print(f"Error: {response.status_code}, Response: {response.text}")
17 response.raise_for_status()
18
19transcript_response = response.json()
20transcript_id = transcript_response["id"]
21polling_endpoint = f"{base_url}/v2/transcript/{transcript_id}"
22
23while True:
24 transcript = requests.get(polling_endpoint, headers=headers).json()
25 if transcript["status"] == "completed":
26 print(transcript["text"])
27 break
28 elif transcript["status"] == "error":
29 raise RuntimeError(f"Transcription failed: {transcript['error']}")
30 else:
31 time.sleep(3)
Keyword count limits

While we support up to 1000 key words and phrases, actual capacity may be lower due to internal tokenization and implementation constraints. Key points to remember:

  • Each word in a multi-word phrase counts towards the 1000 keyword limit
  • Capitalization affects capacity (uppercase tokens consume more than lowercase)
  • Longer words consume more capacity than shorter words

For optimal results, use shorter phrases when possible and be mindful of your total token count when approaching the keyword limit.

Here is an example of what a keyterms_prompt list might look like for a transcription of a professional therapy session for a patient named Jane Doe, who is being treated for anxiety and depression:

["Jane Doe", "cognitive behavioral therapy", "major depressive disorder", "generalized anxiety disorder", "ADHD", "trauma-informed care", "Lexapro 10mg", "psychosocial assessment", "therapeutic alliance", "emotional dysregulation", "GAD-7", "PHQ-9", "Citalopram 20mg", "Lorazepam 2mg"]

Feedback

We welcome your feedback on Slam-1 during this beta period. Share thoughts by emailing our Support team at support@assemblyai.com.