Getting started

End-to-end examples

Copy-paste pipelines that combine multiple AssemblyAI products in a single script.

Overview

Each example below is a self-contained script that wires together several AssemblyAI products into a working pipeline. Run one, see the polished output, and customize from there.

PipelineProducts usedBest for
Meeting notetakerSTT + speaker diarization + Speaker Identification + language detection + LLM GatewayTeam meetings, standups, all-hands
Sales call intelligenceSTT + speaker diarization + Speaker Identification + sentiment analysis + LLM GatewayRevenue teams, coaching, QA
Medical scribeSTT + speaker diarization + Speaker Identification + Medical Mode + entity detection + LLM GatewayClinical documentation, SOAP notes
Content repurposingSTT + key phrases + LLM GatewayPodcasts, webinars, marketing
Real-time meeting assistantStreaming STT + LLM GatewayLive captions, real-time summaries
Real-time live captionerStreaming STT + keyterms promptingAccessibility, live events

Every example uses placeholder API keys (YOUR_API_KEY). Replace them with your actual key from the AssemblyAI dashboard.


Pre-recorded pipelines

These pipelines transcribe an existing audio file, then enrich the transcript with Speech Understanding features and LLM Gateway analysis.

Pipeline 1 — Meeting notetaker

Transcribe a meeting recording with speaker labels and automatic language detection, identify speakers by name, then send the transcript to LLM Gateway for a formatted summary with action items.

Products used: Pre-recorded STT + speaker diarization + Speaker Identification + language detection + LLM Gateway

Model selection: This example uses both universal-3-pro and universal-2 for broad language coverage across 99 languages. If your meetings are English-only, you can use universal-3-pro alone for the highest accuracy.

1import requests
2import time
3
4# ── Config ────────────────────────────────────────────────────
5base_url = "https://api.assemblyai.com"
6headers = {"authorization": "YOUR_API_KEY"}
7
8audio_url = "https://assembly.ai/wildfires.mp3"
9
10# ── Step 1: Transcribe with speaker labels + language detection ──
11data = {
12 "audio_url": audio_url,
13 "speech_models": ["universal-3-pro", "universal-2"],
14 "language_detection": True,
15 "speaker_labels": True,
16}
17
18response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)
19response.raise_for_status()
20transcript_id = response.json()["id"]
21
22while True:
23 result = requests.get(f"{base_url}/v2/transcript/{transcript_id}", headers=headers).json()
24 if result["status"] == "completed":
25 break
26 elif result["status"] == "error":
27 raise RuntimeError(f"Transcription failed: {result['error']}")
28 time.sleep(3)
29
30# ── Step 2: Identify speakers by name ──
31understanding_response = requests.post(
32 "https://llm-gateway.assemblyai.com/v1/understanding",
33 headers=headers,
34 json={
35 "transcript_id": transcript_id,
36 "speech_understanding": {
37 "request": {
38 "speaker_identification": {
39 "speaker_type": "name",
40 "known_values": ["Alice", "Bob"], # Replace with actual participant names
41 }
42 }
43 },
44 },
45)
46understanding_response.raise_for_status()
47identified = understanding_response.json()
48
49# ── Step 3: Format identified transcript for the LLM ──
50speaker_transcript = "\n".join(
51 f"{u['speaker']}: {u['text']}" for u in identified["utterances"]
52)
53
54# ── Step 4: Generate meeting notes via LLM Gateway ──
55llm_response = requests.post(
56 "https://llm-gateway.assemblyai.com/v1/chat/completions",
57 headers=headers,
58 json={
59 "model": "claude-sonnet-4-5-20250929",
60 "messages": [
61 {
62 "role": "user",
63 "content": (
64 "You are a meeting notes assistant. Given the transcript below, produce:\n"
65 "1. A concise summary (3-5 sentences)\n"
66 "2. Key decisions made\n"
67 "3. Action items with owners (use speaker labels)\n\n"
68 f"Transcript:\n{speaker_transcript}"
69 ),
70 }
71 ],
72 "max_tokens": 2000,
73 },
74)
75llm_response.raise_for_status()
76
77print("=== Meeting Notes ===\n")
78print(llm_response.json()["choices"][0]["message"]["content"])
=== Meeting Notes ===
## Summary
The discussion covered the impact of Canadian wildfire smoke on US air quality.
Experts explained how particulate matter affects respiratory and cardiovascular
health. The group reviewed current air quality index readings and discussed
protective measures for affected communities.
## Key decisions
- Monitor AQI levels daily until smoke clears
- Issue public health advisories for sensitive groups
## Action items
- Alice: Compile daily AQI data for the affected regions
- Bob: Draft public advisory messaging for distribution
- Alice: Coordinate with local health departments on response protocols

Speaker Identification maps generic labels like “Speaker A” to real names. You can pass a list of known_values to guide identification, or omit it to let the model infer names from the conversation. Learn more in the Speaker Identification guide.


Pipeline 2 — Sales call intelligence

Transcribe a sales call with speaker labels and sentiment analysis, identify speakers by role, then use LLM Gateway to generate a coaching scorecard with talk/listen ratio and sentiment insights.

Products used: Pre-recorded STT + speaker diarization + Speaker Identification + sentiment analysis + LLM Gateway

Model selection: Uses universal-3-pro for the highest English accuracy. For multilingual sales teams, add universal-2 as a fallback.

1import requests
2import time
3from collections import Counter
4
5# ── Config ────────────────────────────────────────────────────
6base_url = "https://api.assemblyai.com"
7headers = {"authorization": "YOUR_API_KEY"}
8
9audio_url = "https://assembly.ai/wildfires.mp3"
10
11# ── Step 1: Transcribe with speaker labels + sentiment analysis ──
12data = {
13 "audio_url": audio_url,
14 "speech_models": ["universal-3-pro"],
15 "speaker_labels": True,
16 "sentiment_analysis": True,
17}
18
19response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)
20response.raise_for_status()
21transcript_id = response.json()["id"]
22
23while True:
24 result = requests.get(f"{base_url}/v2/transcript/{transcript_id}", headers=headers).json()
25 if result["status"] == "completed":
26 break
27 elif result["status"] == "error":
28 raise RuntimeError(f"Transcription failed: {result['error']}")
29 time.sleep(3)
30
31# ── Step 2: Identify speakers by role ──
32understanding_response = requests.post(
33 "https://llm-gateway.assemblyai.com/v1/understanding",
34 headers=headers,
35 json={
36 "transcript_id": transcript_id,
37 "speech_understanding": {
38 "request": {
39 "speaker_identification": {
40 "speaker_type": "role",
41 "known_values": ["Sales Rep", "Customer"],
42 }
43 }
44 },
45 },
46)
47understanding_response.raise_for_status()
48identified = understanding_response.json()
49
50# ── Step 3: Calculate talk/listen ratio per speaker ──
51speaker_durations = Counter()
52for utterance in identified["utterances"]:
53 duration_ms = utterance["end"] - utterance["start"]
54 speaker_durations[utterance["speaker"]] += duration_ms
55
56total_ms = sum(speaker_durations.values())
57talk_ratios = {
58 speaker: round(dur / total_ms * 100, 1)
59 for speaker, dur in speaker_durations.items()
60}
61
62# ── Step 4: Summarize sentiment shifts ──
63sentiment_by_speaker = {}
64for s in result["sentiment_analysis_results"]:
65 speaker = s.get("speaker", "Unknown")
66 sentiment_by_speaker.setdefault(speaker, []).append(s["sentiment"])
67
68sentiment_summary = ""
69for speaker, sentiments in sentiment_by_speaker.items():
70 counts = Counter(sentiments)
71 sentiment_summary += (
72 f"{speaker}: "
73 f"{counts.get('POSITIVE', 0)} positive, "
74 f"{counts.get('NEUTRAL', 0)} neutral, "
75 f"{counts.get('NEGATIVE', 0)} negative\n"
76 )
77
78# ── Step 5: Format transcript and generate coaching scorecard ──
79speaker_transcript = "\n".join(
80 f"{u['speaker']}: {u['text']}" for u in identified["utterances"]
81)
82
83llm_response = requests.post(
84 "https://llm-gateway.assemblyai.com/v1/chat/completions",
85 headers=headers,
86 json={
87 "model": "claude-sonnet-4-5-20250929",
88 "messages": [
89 {
90 "role": "user",
91 "content": (
92 "You are a sales coaching assistant. Analyze this sales call and produce a scorecard.\n\n"
93 f"Talk/listen ratios: {talk_ratios}\n\n"
94 f"Sentiment breakdown:\n{sentiment_summary}\n"
95 f"Transcript:\n{speaker_transcript}\n\n"
96 "Produce:\n"
97 "1. Call summary (2-3 sentences)\n"
98 "2. Talk/listen ratio analysis (ideal is 40/60 for the rep)\n"
99 "3. Customer sentiment shifts and what caused them\n"
100 "4. Top 3 coaching suggestions for the sales rep"
101 ),
102 }
103 ],
104 "max_tokens": 2000,
105 },
106)
107llm_response.raise_for_status()
108
109print("=== Sales Call Scorecard ===\n")
110print(llm_response.json()["choices"][0]["message"]["content"])
=== Sales Call Scorecard ===
## Call summary
This call discussed the environmental and health impacts of wildfire smoke on
US communities. The speakers covered air quality data, health risks, and
recommended precautions for the public.
## Talk/listen ratio
Sales Rep: 65.3% | Customer: 34.7%
Analysis: The ratio is inverted from the ideal 40/60 split. The rep dominated
the conversation — focus on asking more open-ended questions.
## Customer sentiment shifts
- Started neutral during introductions
- Shifted negative when discussing health risks and poor air quality readings
- Returned to neutral during the action-planning portion
## Coaching suggestions
1. Ask more discovery questions early to understand the customer's specific concerns
2. When the customer expresses concern, acknowledge before pivoting to solutions
3. Summarize key points at the end and confirm next steps with clear ownership

Pipeline 3 — Medical scribe

Transcribe a clinical encounter using Medical Mode with speaker labels and entity detection, identify speakers by role, then use LLM Gateway to generate a structured SOAP note.

Products used: Pre-recorded STT + Medical Mode + speaker diarization + Speaker Identification + entity detection + LLM Gateway

Model selection: Uses universal-3-pro-medical for purpose-built accuracy on medical terminology, drug names, and clinical language.

Medical Mode requires a signed BAA with AssemblyAI. Contact sales@assemblyai.com for access.

1import requests
2import time
3
4# ── Config ────────────────────────────────────────────────────
5base_url = "https://api.assemblyai.com"
6headers = {"authorization": "YOUR_API_KEY"}
7
8audio_url = "https://assembly.ai/wildfires.mp3" # Replace with your clinical audio
9
10# ── Step 1: Transcribe with Medical Mode + speaker labels + entity detection ──
11data = {
12 "audio_url": audio_url,
13 "speech_models": ["universal-3-pro-medical"],
14 "speaker_labels": True,
15 "entity_detection": True,
16}
17
18response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)
19response.raise_for_status()
20transcript_id = response.json()["id"]
21
22while True:
23 result = requests.get(f"{base_url}/v2/transcript/{transcript_id}", headers=headers).json()
24 if result["status"] == "completed":
25 break
26 elif result["status"] == "error":
27 raise RuntimeError(f"Transcription failed: {result['error']}")
28 time.sleep(3)
29
30# ── Step 2: Identify speakers by role ──
31understanding_response = requests.post(
32 "https://llm-gateway.assemblyai.com/v1/understanding",
33 headers=headers,
34 json={
35 "transcript_id": transcript_id,
36 "speech_understanding": {
37 "request": {
38 "speaker_identification": {
39 "speaker_type": "role",
40 "known_values": ["Provider", "Patient"],
41 }
42 }
43 },
44 },
45)
46understanding_response.raise_for_status()
47identified = understanding_response.json()
48
49# ── Step 3: Extract detected entities ──
50entities = result.get("entities", [])
51entity_summary = "\n".join(
52 f"- {e['entity_type']}: {e['text']}" for e in entities
53)
54
55# ── Step 4: Format identified transcript ──
56speaker_transcript = "\n".join(
57 f"{u['speaker']}: {u['text']}" for u in identified["utterances"]
58)
59
60# ── Step 5: Generate SOAP note via LLM Gateway ──
61llm_response = requests.post(
62 "https://llm-gateway.assemblyai.com/v1/chat/completions",
63 headers=headers,
64 json={
65 "model": "claude-sonnet-4-5-20250929",
66 "messages": [
67 {
68 "role": "user",
69 "content": (
70 "You are a medical scribe. Given the clinical encounter transcript and "
71 "detected entities below, generate a structured SOAP note.\n\n"
72 "Format the note with these sections:\n"
73 "- **Subjective**: Patient's reported symptoms and history\n"
74 "- **Objective**: Clinical observations and measurements\n"
75 "- **Assessment**: Diagnosis or clinical impression\n"
76 "- **Plan**: Treatment plan, prescriptions, and follow-up\n\n"
77 f"Detected entities:\n{entity_summary}\n\n"
78 f"Transcript:\n{speaker_transcript}"
79 ),
80 }
81 ],
82 "max_tokens": 2000,
83 },
84)
85llm_response.raise_for_status()
86
87print("=== SOAP Note ===\n")
88print(llm_response.json()["choices"][0]["message"]["content"])
=== SOAP Note ===
## Subjective
Patient reports exposure to wildfire smoke over the past several days. Describes
worsening cough, shortness of breath, and eye irritation. Symptoms began
approximately 3 days ago coinciding with elevated air quality alerts in the region.
## Objective
- AQI reading: 150 micrograms per cubic meter (10x annual average)
- Particulate matter levels classified as "unhealthy"
- Patient appears alert and oriented
## Assessment
Acute respiratory irritation secondary to wildfire smoke exposure.
Environmental exposure consistent with regional air quality emergency.
## Plan
1. Advise patient to remain indoors with windows closed
2. Recommend N95 mask for any necessary outdoor activity
3. Prescribe albuterol inhaler PRN for acute bronchospasm
4. Follow up in 1 week or sooner if symptoms worsen
5. Refer to pulmonology if symptoms persist beyond 2 weeks

For more on building clinical documentation apps, see the Medical Scribe Best Practices guide.


Pipeline 4 — Content repurposing

Transcribe a podcast or webinar, extract key phrases, then use LLM Gateway to generate a blog post draft with highlights.

Products used: Pre-recorded STT + key phrases + LLM Gateway

Model selection: Uses universal-3-pro with universal-2 fallback for multilingual content. If your content is English-only, universal-3-pro alone gives the best results.

1import requests
2import time
3
4# ── Config ────────────────────────────────────────────────────
5base_url = "https://api.assemblyai.com"
6headers = {"authorization": "YOUR_API_KEY"}
7
8audio_url = "https://assembly.ai/wildfires.mp3"
9
10# ── Step 1: Transcribe with key phrases enabled ──
11data = {
12 "audio_url": audio_url,
13 "speech_models": ["universal-3-pro", "universal-2"],
14 "language_detection": True,
15 "auto_highlights": True,
16}
17
18response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)
19response.raise_for_status()
20transcript_id = response.json()["id"]
21
22while True:
23 result = requests.get(f"{base_url}/v2/transcript/{transcript_id}", headers=headers).json()
24 if result["status"] == "completed":
25 break
26 elif result["status"] == "error":
27 raise RuntimeError(f"Transcription failed: {result['error']}")
28 time.sleep(3)
29
30# ── Step 2: Get paragraph-level content for structure ──
31paragraphs_response = requests.get(
32 f"{base_url}/v2/transcript/{transcript_id}/paragraphs", headers=headers
33)
34paragraphs = paragraphs_response.json()["paragraphs"]
35
36# ── Step 3: Extract top key phrases ──
37highlights = result.get("auto_highlights_result", {}).get("results", [])
38top_phrases = sorted(highlights, key=lambda x: x["rank"], reverse=True)[:10]
39phrases_list = ", ".join(p["text"] for p in top_phrases)
40
41# ── Step 4: Generate blog post via LLM Gateway ──
42transcript_text = result["text"]
43
44llm_response = requests.post(
45 "https://llm-gateway.assemblyai.com/v1/chat/completions",
46 headers=headers,
47 json={
48 "model": "claude-sonnet-4-5-20250929",
49 "messages": [
50 {
51 "role": "user",
52 "content": (
53 "You are a content writer. Transform this transcript into an engaging "
54 "blog post.\n\n"
55 "Requirements:\n"
56 "- Write a compelling title and subtitle\n"
57 "- Break the content into 3-5 sections with headers\n"
58 "- Weave in the key phrases naturally\n"
59 "- Add a TL;DR at the top\n"
60 "- End with a call-to-action\n\n"
61 f"Key phrases: {phrases_list}\n\n"
62 f"Transcript:\n{transcript_text}"
63 ),
64 }
65 ],
66 "max_tokens": 3000,
67 },
68)
69llm_response.raise_for_status()
70
71print("=== Blog Post Draft ===\n")
72print(llm_response.json()["choices"][0]["message"]["content"])
=== Blog Post Draft ===
# When the Sky Turns Orange: Understanding Wildfire Smoke and Air Quality
**How Canadian wildfires are reshaping air quality across the United States**
**TL;DR:** Wildfire smoke from Canada is triggering widespread air quality alerts
in the US, with particulate matter levels reaching 10x normal in some cities.
Here's what you need to know about the health risks and how to protect yourself.
## The smoke crosses borders
Hundreds of wildfires burning across Canada have sent massive plumes of smoke
southward into the United States, creating hazy skies and triggering air quality
alerts from the Midwest to the Eastern Seaboard...
## Understanding particulate matter
The real danger lies in fine particulate matter — microscopic particles that
can penetrate deep into your lungs and even enter your bloodstream...
## Protecting your health
Health experts recommend staying indoors, using air purifiers, and wearing
N95 masks when outdoor exposure is unavoidable...
## Looking ahead
As climate change intensifies wildfire seasons, these cross-border smoke events
are likely to become more frequent...
---
*Want to transcribe your own podcast or webinar? Get started with AssemblyAI's
API at [assemblyai.com](https://assemblyai.com).*

Streaming pipelines

These pipelines use the Streaming STT API to transcribe audio in real time from a microphone, with optional LLM Gateway integration for live analysis.

Pipeline 5 — Real-time meeting assistant

Stream audio from your microphone with speaker diarization and LLM Gateway to get live transcription and automatic summaries after each speaker turn.

Products used: Streaming STT + Universal-3 Pro + LLM Gateway

Model selection: Uses u3-rt-pro (Universal-3 Pro Streaming) for the lowest latency (~300ms) with the highest streaming accuracy.

1# pip install pyaudio websocket-client
2import pyaudio
3import websocket
4import json
5import threading
6import time
7from urllib.parse import urlencode
8
9# ── Config ────────────────────────────────────────────────────
10YOUR_API_KEY = "YOUR_API_KEY"
11
12PROMPT = (
13 "Summarize this speaker turn in one sentence, then list any "
14 "action items mentioned.\n\nTranscript: {{turn}}"
15)
16
17LLM_GATEWAY_CONFIG = {
18 "model": "claude-sonnet-4-5-20250929",
19 "messages": [{"role": "user", "content": PROMPT}],
20 "max_tokens": 500,
21}
22
23CONNECTION_PARAMS = {
24 "sample_rate": 16000,
25 "speech_model": "u3-rt-pro",
26 "format_turns": True,
27 "min_turn_silence": 560, # Wait longer for natural meeting pauses
28 "max_turn_silence": 2000,
29 "llm_gateway": json.dumps(LLM_GATEWAY_CONFIG),
30}
31
32API_ENDPOINT = f"wss://streaming.assemblyai.com/v3/ws?{urlencode(CONNECTION_PARAMS)}"
33
34# Audio settings
35FRAMES_PER_BUFFER = 800
36SAMPLE_RATE = 16000
37stop_event = threading.Event()
38
39def on_open(ws):
40 print("Connected — speak into your microphone. Press Ctrl+C to stop.\n")
41
42 def stream_audio():
43 audio = pyaudio.PyAudio()
44 stream = audio.open(
45 input=True, frames_per_buffer=FRAMES_PER_BUFFER,
46 channels=1, format=pyaudio.paInt16, rate=SAMPLE_RATE,
47 )
48 while not stop_event.is_set():
49 try:
50 data = stream.read(FRAMES_PER_BUFFER, exception_on_overflow=False)
51 ws.send(data, websocket.ABNF.OPCODE_BINARY)
52 except Exception:
53 break
54 stream.stop_stream()
55 stream.close()
56 audio.terminate()
57
58 threading.Thread(target=stream_audio, daemon=True).start()
59
60def on_message(ws, message):
61 data = json.loads(message)
62 msg_type = data.get("type")
63
64 if msg_type == "Turn":
65 transcript = data.get("transcript", "")
66 if data.get("end_of_turn") and transcript:
67 print(f"[Turn] {transcript}\n")
68 elif transcript:
69 print(f"\r ... {transcript[-80:]}", end="", flush=True)
70
71 elif msg_type == "LLMGatewayResponse":
72 content = data.get("data", {}).get("choices", [{}])[0].get("message", {}).get("content", "")
73 print(f"[Assistant] {content}\n")
74
75 elif msg_type == "Termination":
76 print(f"\nSession ended — {data.get('audio_duration_seconds', 0)}s of audio processed.")
77
78def on_error(ws, error):
79 print(f"Error: {error}")
80 stop_event.set()
81
82def on_close(ws, code, msg):
83 stop_event.set()
84
85ws_app = websocket.WebSocketApp(
86 API_ENDPOINT,
87 header={"Authorization": YOUR_API_KEY},
88 on_open=on_open, on_message=on_message,
89 on_error=on_error, on_close=on_close,
90)
91
92ws_thread = threading.Thread(target=ws_app.run_forever, daemon=True)
93ws_thread.start()
94
95try:
96 while ws_thread.is_alive():
97 time.sleep(0.1)
98except KeyboardInterrupt:
99 print("\nStopping...")
100 stop_event.set()
101 if ws_app.sock and ws_app.sock.connected:
102 ws_app.send(json.dumps({"type": "Terminate"}))
103 time.sleep(2)
104 ws_app.close()
Connected — speak into your microphone. Press Ctrl+C to stop.
[Turn] So the main thing we need to decide today is whether we're going
with vendor A or vendor B for the new analytics platform.
[Assistant] The speaker is initiating a decision discussion about choosing
between two analytics platform vendors.
Action items: None yet — decision pending.
[Turn] I think vendor A has better pricing but vendor B has the integrations
we need. Can someone pull the comparison spreadsheet by Friday?
[Assistant] The speaker compared vendor pricing vs. integrations and requested
a comparison document.
Action items:
- Pull the vendor comparison spreadsheet by Friday

Pipeline 6 — Real-time live captioner

Stream audio from your microphone with keyterms prompting for domain-specific accuracy, ideal for live events, accessibility, and broadcast captioning.

Products used: Streaming STT + Universal-3 Pro + keyterms prompting

Model selection: Uses u3-rt-pro for sub-300ms latency with format_turns enabled for clean, readable captions.

1# pip install pyaudio websocket-client
2import pyaudio
3import websocket
4import json
5import threading
6import time
7from urllib.parse import urlencode
8
9# ── Config ────────────────────────────────────────────────────
10YOUR_API_KEY = "YOUR_API_KEY"
11
12# Add domain-specific terms to boost recognition accuracy
13KEYTERMS = ["AssemblyAI", "Universal-3 Pro", "LLM Gateway", "speech-to-text"]
14
15CONNECTION_PARAMS = {
16 "sample_rate": 16000,
17 "speech_model": "u3-rt-pro",
18 "format_turns": True,
19 "keyterms_prompt": KEYTERMS,
20}
21
22API_ENDPOINT = (
23 f"wss://streaming.assemblyai.com/v3/ws?{urlencode(CONNECTION_PARAMS, doseq=True)}"
24)
25
26# Audio settings
27FRAMES_PER_BUFFER = 800
28SAMPLE_RATE = 16000
29stop_event = threading.Event()
30caption_count = 0
31
32def on_open(ws):
33 print(f"Live captioning started — keyterms: {', '.join(KEYTERMS)}")
34 print("Speak into your microphone. Press Ctrl+C to stop.\n")
35 print("-" * 60)
36
37 def stream_audio():
38 audio = pyaudio.PyAudio()
39 stream = audio.open(
40 input=True, frames_per_buffer=FRAMES_PER_BUFFER,
41 channels=1, format=pyaudio.paInt16, rate=SAMPLE_RATE,
42 )
43 while not stop_event.is_set():
44 try:
45 data = stream.read(FRAMES_PER_BUFFER, exception_on_overflow=False)
46 ws.send(data, websocket.ABNF.OPCODE_BINARY)
47 except Exception:
48 break
49 stream.stop_stream()
50 stream.close()
51 audio.terminate()
52
53 threading.Thread(target=stream_audio, daemon=True).start()
54
55def on_message(ws, message):
56 global caption_count
57 data = json.loads(message)
58
59 if data.get("type") == "Turn":
60 transcript = data.get("transcript", "")
61 if data.get("end_of_turn") and transcript:
62 caption_count += 1
63 print(f"\r[{caption_count:03d}] {transcript}")
64 elif transcript:
65 # Show partial (live) caption
66 print(f"\r >> {transcript[-70:]}", end="", flush=True)
67
68 elif data.get("type") == "Termination":
69 duration = data.get("audio_duration_seconds", 0)
70 print(f"\n{'=' * 60}")
71 print(f"Session ended — {caption_count} captions, {duration}s of audio")
72
73def on_error(ws, error):
74 print(f"\nError: {error}")
75 stop_event.set()
76
77def on_close(ws, code, msg):
78 stop_event.set()
79
80ws_app = websocket.WebSocketApp(
81 API_ENDPOINT,
82 header={"Authorization": YOUR_API_KEY},
83 on_open=on_open, on_message=on_message,
84 on_error=on_error, on_close=on_close,
85)
86
87ws_thread = threading.Thread(target=ws_app.run_forever, daemon=True)
88ws_thread.start()
89
90try:
91 while ws_thread.is_alive():
92 time.sleep(0.1)
93except KeyboardInterrupt:
94 print("\n\nStopping...")
95 stop_event.set()
96 if ws_app.sock and ws_app.sock.connected:
97 ws_app.send(json.dumps({"type": "Terminate"}))
98 time.sleep(2)
99 ws_app.close()
Live captioning started — keyterms: AssemblyAI, Universal-3 Pro, LLM Gateway, speech-to-text
Speak into your microphone. Press Ctrl+C to stop.
------------------------------------------------------------
[001] Welcome everyone to today's demo of AssemblyAI's speech-to-text platform.
[002] We'll be showing you how Universal-3 Pro handles real-time transcription.
[003] The LLM Gateway integration lets you add AI analysis on top of your
transcripts without switching providers.
============================================================
Session ended — 3 captions, 24s of audio

Customize and extend

Each pipeline above is a starting point. Here are common ways to build on them:

  • Swap LLM models — Change the model parameter in LLM Gateway requests to use any of the 20+ supported models (Claude, GPT, Gemini, and more).
  • Add structured output — Use Structured Outputs to constrain LLM responses to a JSON schema for easier downstream processing.
  • Add PII redaction — Enable PII Redaction to automatically mask sensitive information before it reaches the LLM.
  • Use Speaker Identification — Replace generic speaker labels with real names using Speaker Identification.
  • Add Translation — Translate transcripts into 20+ languages using Translation.
  • Use webhooks — Replace polling with webhooks for production workloads so your server gets notified when transcription completes.

Next steps