Use cases & integrationsUse case guides

Best Practices for building Contact Center Applications

Introduction

Building a contact center application requires careful consideration of accuracy, speaker separation, compliance, and scalability. This guide addresses common questions and provides practical solutions for both post-call analytics and real-time agent assist scenarios.

Why AssemblyAI for contact centers?

AssemblyAI stands out as the premier choice for contact center applications with several key advantages:

Industry-leading accuracy on telephony audio

  • Universal-3-Pro model delivers best-in-class accuracy on 8kHz telephony audio
  • 2.9% speaker diarization error rate for precise agent vs. customer attribution
  • Multichannel support for stereo call recordings where agent and customer are on separate channels
  • Keyterms prompt allows providing call context to improve accuracy of company names, products, and compliance phrases

Streaming with Universal-3 Pro

For real-time agent assist, AssemblyAI’s Universal-3 Pro Streaming model (u3-rt-pro) offers:

  • Low latency enables live transcription during calls
  • Format turns feature provides structured, readable output
  • Dynamic prompting via UpdateConfiguration to update context mid-call
  • Dual-channel streaming for separate agent and customer audio streams

End-to-end voice AI platform

Unlike fragmented solutions, AssemblyAI provides a unified API for:

  • Transcription with speaker diarization (agent vs. customer)
  • Multichannel audio support for stereo call recordings
  • PII redaction on both text and audio for HIPAA and PCI compliance
  • Post-processing workflows with custom prompting - from call summaries to QA scoring
  • Streaming and pre-recorded transcription in a single platform
  • Compliance and security built for enterprise workloads (BAA, SOC2, ISO)

When should I use pre-recorded vs streaming for contact centers?

Understanding when to use pre-recorded versus streaming is critical for contact center workflows.

Pre-recorded Speech-to-text

Post-call analytics - Call already happened, you have the full recording

  • Highest accuracy needed - Pre-recorded models have the highest accuracy
  • Speaker diarization is critical - Pre-recorded has 2.9% speaker error rate
  • Multichannel recordings - Most contact center recordings are stereo with agent and customer on separate channels
  • Compliance workflows - Full PII redaction with audio de-identification
  • Post-call analytics - Summarization, sentiment analysis, entity detection, QA scoring
  • Batch processing - Processing large volumes of call recordings

Best for: QA scoring, compliance monitoring, coaching insights, post-call CRM updates, searchable call archives

Streaming Speech-to-text

Live calls - Transcribing as the call happens

You should use streaming when you need to display a live transcript to agents during calls. With Universal-3 Pro Streaming, accuracy is closer to pre-recorded, but pre-recorded will always be the most accurate option.

  • Agent assist - Live transcription visible to agents during calls
  • Real-time coaching - Prompt agents with suggested responses or compliance reminders
  • Live compliance monitoring - Detect compliance violations in real-time
  • No recording available - Processing live audio only

Best for: Agent assist, real-time coaching, live compliance monitoring, live call transcription

Many contact center platforms use both:

  1. Streaming during the call - Provide live transcription for agent assist and real-time coaching
  2. Pre-recorded after the call - Generate high-quality transcript with speaker labels, summary, and analytics

Example workflow:

  • Call begins → Start streaming for live agent assist
  • Call ends → Upload recording to pre-recorded API for final transcript with speaker names
  • Generate call summary, QA score, and compliance report from pre-recorded transcript
  • Push results to CRM (e.g., Salesforce)

What languages and features for a contact center application?

Pre-recorded calls (Universal-3-Pro)

For post-call analytics, AssemblyAI supports:

Languages:

  • 99 languages supported
  • Automatic Language Detection to route to the most spoken language
  • Code Switching to preserve changes in speech between languages

Core Features:

  • Speaker diarization (agent-customer separation)
  • Multichannel audio support - when agent and customer are on separate audio channels, enables perfect speaker separation without diarization
  • Automatic formatting, punctuation, and capitalization
  • Keyterms prompting for boosting domain-specific terms (up to 1000 terms for Universal-3-Pro)
  • Natural language prompting (Universal-3-Pro) - up to 1,500 words to guide transcription behavior
  • Speaker options with configurable min/max expected speakers for call transfers

Speech Understanding:

  • Summarization for call recaps
  • Sentiment analysis for customer satisfaction tracking
  • Entity detection for extracting names, account numbers, and products
  • Speaker identification to map generic labels to agent and customer names
  • Translation between 100+ languages

Guardrails:

  • PII redaction on text and audio for HIPAA and PCI compliance

Streaming (Universal-3 Pro Streaming)

For live call transcription, use Universal-3 Pro Streaming (u3-rt-pro) for the highest streaming accuracy:

Core Features:

  • Speaker diarization for identifying agent vs. customer
  • Partial and final transcripts for responsive UI
  • Format turns for structured, readable output
  • Keyterms prompt for company names, products, and compliance phrases
  • Dual-channel streaming for separate agent and customer audio

For more details, see the Universal-3 Pro Streaming documentation.

How can I get started building a post-call analytics pipeline?

Here’s a complete example implementing pre-recorded transcription for contact center call analysis:

1import assemblyai as aai
2import asyncio
3from typing import Dict, List
4from assemblyai.types import (
5 SpeakerOptions,
6 PIIRedactionPolicy,
7 PIISubstitutionPolicy,
8)
9
10# Configure API key
11aai.settings.api_key = "your_api_key_here"
12
13async def transcribe_call(audio_source: str, agent_name: str = None) -> Dict:
14 """
15 Transcribe a contact center call recording with full analytics
16
17 Args:
18 audio_source: Either a local file path or publicly accessible URL
19 agent_name: Optional agent name for speaker identification
20 """
21 # Configure comprehensive call analysis
22 config = aai.TranscriptionConfig(
23 # Model selection
24 speech_models=["universal-3-pro", "universal-2"],
25
26 # Speaker diarization
27 speaker_labels=True,
28 speaker_options=SpeakerOptions(
29 min_speakers_expected=2, # Agent and customer
30 max_speakers_expected=5 # Allow for call transfers - safe to keep high
31 ),
32 multichannel=False, # Set to True if audio has separate channel per speaker
33
34 # Language detection
35 language_detection=True,
36
37 # Boost accuracy of contact center vocabulary
38 keyterms_prompt=[
39 # Company-specific terms
40 "Acme Corp", "Premium Support Plan",
41
42 # Compliance phrases
43 "recorded line", "calls are monitored and recorded",
44
45 # Common contact center terms
46 "account number", "case number", "ticket number",
47 "escalation", "supervisor", "hold time",
48 ],
49
50 # Post-call analytics
51 summarization=True,
52 sentiment_analysis=True,
53 entity_detection=True,
54
55 # PII protection for compliance
56 redact_pii=True,
57 redact_pii_policies=[
58 PIIRedactionPolicy.person_name,
59 PIIRedactionPolicy.phone_number,
60 PIIRedactionPolicy.email_address,
61 PIIRedactionPolicy.account_number,
62 PIIRedactionPolicy.us_social_security_number,
63 PIIRedactionPolicy.credit_card_number,
64 PIIRedactionPolicy.credit_card_cvv,
65 PIIRedactionPolicy.credit_card_expiration,
66 PIIRedactionPolicy.date_of_birth,
67 ],
68 redact_pii_sub=PIISubstitutionPolicy.hash,
69 redact_pii_audio=True,
70 )
71
72 # Add speaker identification if agent name is known
73 if agent_name:
74 config.speech_understanding = {
75 "request": {
76 "speaker_identification": {
77 "speaker_type": "role",
78 "speakers": [
79 {"role": "Agent", "name": agent_name},
80 {"role": "Customer"}
81 ]
82 }
83 }
84 }
85
86 # Create transcriber
87 transcriber = aai.Transcriber()
88
89 try:
90 # Submit transcription job
91 transcript = await asyncio.to_thread(
92 transcriber.transcribe,
93 audio_source,
94 config=config
95 )
96
97 # Check status
98 if transcript.status == aai.TranscriptStatus.error:
99 raise Exception(f"Transcription failed: {transcript.error}")
100
101 # Process speaker-labeled utterances
102 for utterance in transcript.utterances:
103 start_time = utterance.start / 1000 # Convert ms to seconds
104 end_time = utterance.end / 1000
105
106 print(f"[{start_time:.1f}s - {end_time:.1f}s] {utterance.speaker}:")
107 print(f" {utterance.text}\n")
108
109 return {
110 "transcript": transcript,
111 "utterances": transcript.utterances,
112 "summary": transcript.summary,
113 "sentiment": transcript.sentiment_analysis_results,
114 "entities": transcript.entities,
115 "redacted_audio_url": transcript.redacted_audio_url,
116 }
117
118 except Exception as e:
119 print(f"Error during transcription: {e}")
120 raise
121
122async def main():
123 audio_source = "https://your-storage.com/calls/call_recording.mp3"
124
125 result = await transcribe_call(audio_source, agent_name="Sarah Johnson")
126
127 print(f"\nCall duration: {result['transcript'].audio_duration} seconds")
128 print(f"Summary: {result['summary']}")
129
130if __name__ == "__main__":
131 asyncio.run(main())

How Do I Handle Multichannel Contact Center Audio?

Most contact center recordings are stereo with the agent on one channel and the customer on the other. Multichannel transcription gives you perfect speaker separation without diarization.

Pre-recorded Multichannel

1config = aai.TranscriptionConfig(
2 speech_models=["universal-3-pro", "universal-2"],
3 multichannel=True, # Enable when agent and customer are on separate channels
4 speaker_labels=False, # Disable - channels already separate speakers
5
6 # Still enable analytics
7 summarization=True,
8 sentiment_analysis=True,
9 entity_detection=True,
10
11 # PII redaction
12 redact_pii=True,
13 redact_pii_policies=[
14 PIIRedactionPolicy.person_name,
15 PIIRedactionPolicy.credit_card_number,
16 PIIRedactionPolicy.us_social_security_number,
17 PIIRedactionPolicy.account_number,
18 ],
19 redact_pii_sub=PIISubstitutionPolicy.hash,
20)
21
22transcriber = aai.Transcriber()
23transcript = transcriber.transcribe(audio_file, config=config)
24
25# Channel 1 = Agent, Channel 2 = Customer (typical layout)
26for utterance in transcript.utterances:
27 role = "Agent" if utterance.channel == "1" else "Customer"
28 print(f"{role}: {utterance.text}")

When to use multichannel:

  • Call recordings from PBX systems with separate agent/customer channels
  • Recordings from platforms like Genesys, Twilio, Five9, NICE, or Talkdesk
  • Any stereo recording where each channel represents a different speaker

Benefits:

  • Perfect speaker separation - No diarization errors
  • No speaker confusion or overlap issues
  • Higher accuracy - Model processes clean single-speaker audio per channel

Streaming Multichannel

For real-time dual-channel transcription, create separate streaming sessions per channel:

1import asyncio
2import websockets
3import json
4from urllib.parse import urlencode
5
6API_KEY = "your_api_key"
7
8class ChannelTranscriber:
9 def __init__(self, channel_id: int, role: str):
10 self.channel_id = channel_id
11 self.role = role
12 self.connection_params = {
13 "sample_rate": 8000, # Telephony standard
14 "speech_model": "u3-rt-pro",
15 "format_turns": True,
16 "encoding": "pcm_mulaw", # Common telephony encoding
17 }
18
19 async def transcribe_channel(self, audio_stream):
20 url = f"wss://streaming.assemblyai.com/v3/ws?{urlencode(self.connection_params, doseq=True)}"
21
22 # If using websockets >= 13.0, use additional_headers. For < 13.0, use extra_headers.
23 async with websockets.connect(url, additional_headers={"Authorization": API_KEY}) as ws:
24 # Send and receive must run concurrently for real-time streaming
25 async def send_audio():
26 async for audio_chunk in audio_stream:
27 await ws.send(audio_chunk)
28
29 async def receive_transcripts():
30 async for message in ws:
31 data = json.loads(message)
32 if data.get("type") == "Turn" and data.get("turn_is_formatted"):
33 print(f"{self.role}: {data['transcript']}")
34
35 await asyncio.gather(send_audio(), receive_transcripts())
36
37# Create transcriber for each channel
38async def transcribe_live_call(agent_audio_stream, customer_audio_stream):
39 agent = ChannelTranscriber(0, "Agent")
40 customer = ChannelTranscriber(1, "Customer")
41
42 await asyncio.gather(
43 agent.transcribe_channel(agent_audio_stream),
44 customer.transcribe_channel(customer_audio_stream),
45 )

See our multichannel streaming guide for complete implementation details.

How Can I Build a Real-Time Agent Assist?

Here’s a complete example for real-time streaming transcription optimized for contact center agent assist:

1# pip install pyaudio websocket-client
2import pyaudio
3import websocket
4import json
5import threading
6import time
7from urllib.parse import urlencode
8from datetime import datetime
9
10# --- Configuration ---
11YOUR_API_KEY = "your_api_key"
12
13# Contact center keyterms
14KEYTERMS = [
15 # Company and product terms
16 "Acme Corp",
17 "Premium Support Plan",
18 "Enterprise License",
19
20 # Compliance phrases
21 "recorded line",
22 "calls are monitored",
23
24 # Common contact center vocabulary
25 "account number",
26 "case number",
27 "escalation",
28 "supervisor",
29]
30
31# CONTACT CENTER CONFIGURATION
32CONNECTION_PARAMS = {
33 "sample_rate": 8000, # Telephony standard (8kHz)
34 "speech_model": "u3-rt-pro", # Universal-3 Pro Streaming for highest accuracy
35 "format_turns": True,
36
37 # Contact center turn detection
38 # u3-rt-pro defaults: min_turn_silence=100ms, max_turn_silence=1000ms
39 "min_turn_silence": 400, # Longer than default for natural call pauses
40 "max_turn_silence": 1500, # Longer for customers explaining issues
41
42 # Keyterms for accuracy
43 "keyterms_prompt": KEYTERMS,
44}
45
46API_ENDPOINT_BASE_URL = "wss://streaming.assemblyai.com/v3/ws"
47API_ENDPOINT = f"{API_ENDPOINT_BASE_URL}?{urlencode(CONNECTION_PARAMS, doseq=True)}"
48
49# Audio Configuration
50FRAMES_PER_BUFFER = 400 # 50ms of audio at 8kHz
51SAMPLE_RATE = CONNECTION_PARAMS["sample_rate"]
52CHANNELS = 1
53FORMAT = pyaudio.paInt16
54
55# Global variables
56audio = None
57stream = None
58ws_app = None
59audio_thread = None
60stop_event = threading.Event()
61transcript_buffer = []
62
63
64def on_open(ws):
65 print("=" * 80)
66 print(f"[{datetime.now().strftime('%H:%M:%S')}] Agent assist transcription started")
67 print(f"Connected to: {API_ENDPOINT_BASE_URL}")
68 print(f"Keyterms configured: {', '.join(KEYTERMS[:5])}...")
69 print("=" * 80)
70
71 def stream_audio():
72 global stream
73 while not stop_event.is_set():
74 try:
75 audio_data = stream.read(FRAMES_PER_BUFFER, exception_on_overflow=False)
76 ws.send(audio_data, websocket.ABNF.OPCODE_BINARY)
77 except Exception as e:
78 if not stop_event.is_set():
79 print(f"Error streaming audio: {e}")
80 break
81
82 global audio_thread
83 audio_thread = threading.Thread(target=stream_audio)
84 audio_thread.daemon = True
85 audio_thread.start()
86
87
88def on_message(ws, message):
89 try:
90 data = json.loads(message)
91 msg_type = data.get("type")
92
93 if msg_type == "Begin":
94 session_id = data.get("id", "N/A")
95 print(f"[SESSION] Started - ID: {session_id}\n")
96
97 elif msg_type == "Turn":
98 end_of_turn = data.get("end_of_turn", False)
99 turn_is_formatted = data.get("turn_is_formatted", False)
100 transcript = data.get("transcript", "")
101 turn_order = data.get("turn_order", 0)
102
103 # Show partials for responsive agent UI
104 if not end_of_turn and transcript:
105 print(f"\r[LIVE] {transcript}", end="", flush=True)
106
107 # Use formatted finals for agent display
108 if end_of_turn and turn_is_formatted and transcript:
109 timestamp = datetime.now().strftime('%H:%M:%S')
110 print(f"\n[{timestamp}] {transcript}")
111
112 # Detect compliance keywords
113 transcript_lower = transcript.lower()
114 if any(term in transcript_lower for term in ["cancel", "refund", "complaint", "supervisor"]):
115 print(" ** ESCALATION KEYWORD DETECTED **")
116
117 transcript_buffer.append({
118 "timestamp": timestamp,
119 "text": transcript,
120 "turn_order": turn_order,
121 "type": "final"
122 })
123 print()
124
125 elif msg_type == "Termination":
126 audio_duration = data.get("audio_duration_seconds", 0)
127 print(f"\n[SESSION] Terminated - Duration: {audio_duration}s")
128
129 elif msg_type == "Error":
130 error_msg = data.get("error", "Unknown error")
131 print(f"\n[ERROR] {error_msg}")
132
133 except json.JSONDecodeError as e:
134 print(f"Error decoding message: {e}")
135 except Exception as e:
136 print(f"Error handling message: {e}")
137
138
139def on_error(ws, error):
140 print(f"\n[WEBSOCKET ERROR] {error}")
141 stop_event.set()
142
143
144def on_close(ws, close_status_code, close_msg):
145 print(f"\n[WEBSOCKET] Disconnected - Status: {close_status_code}, Message: {close_msg}")
146
147 global stream, audio
148 stop_event.set()
149
150 if stream:
151 if stream.is_active():
152 stream.stop_stream()
153 stream.close()
154 stream = None
155 if audio:
156 audio.terminate()
157 audio = None
158 if audio_thread and audio_thread.is_alive():
159 audio_thread.join(timeout=1.0)
160
161
162def run():
163 global audio, stream, ws_app
164
165 audio = pyaudio.PyAudio()
166
167 try:
168 stream = audio.open(
169 input=True,
170 frames_per_buffer=FRAMES_PER_BUFFER,
171 channels=CHANNELS,
172 format=FORMAT,
173 rate=SAMPLE_RATE,
174 )
175 except Exception as e:
176 print(f"Error opening audio stream: {e}")
177 if audio:
178 audio.terminate()
179 return
180
181 ws_app = websocket.WebSocketApp(
182 API_ENDPOINT,
183 header={"Authorization": YOUR_API_KEY},
184 on_open=on_open,
185 on_message=on_message,
186 on_error=on_error,
187 on_close=on_close,
188 )
189
190 ws_thread = threading.Thread(target=ws_app.run_forever)
191 ws_thread.daemon = True
192 ws_thread.start()
193
194 try:
195 while ws_thread.is_alive():
196 time.sleep(0.1)
197 except KeyboardInterrupt:
198 print("\n\nCtrl+C received. Stopping transcription...")
199 stop_event.set()
200
201 if ws_app and ws_app.sock and ws_app.sock.connected:
202 try:
203 terminate_message = {"type": "Terminate"}
204 ws_app.send(json.dumps(terminate_message))
205 time.sleep(1)
206 except Exception as e:
207 print(f"Error sending termination message: {e}")
208
209 if ws_app:
210 ws_app.close()
211
212 ws_thread.join(timeout=2.0)
213
214 finally:
215 if stream and stream.is_active():
216 stream.stop_stream()
217 if stream:
218 stream.close()
219 if audio:
220 audio.terminate()
221 print("Cleanup complete. Exiting.")
222
223
224if __name__ == "__main__":
225 run()

How Should I Handle Pre-recorded Transcription in Production?

For high-volume contact center workloads, use webhooks instead of polling:

1config = aai.TranscriptionConfig(
2 speech_models=["universal-3-pro", "universal-2"],
3 webhook_url="https://your-app.com/webhooks/assemblyai",
4 webhook_auth_header_name="X-Webhook-Secret",
5 webhook_auth_header_value="your_secret_here",
6 speaker_labels=True,
7 multichannel=True,
8 summarization=True,
9 sentiment_analysis=True,
10 entity_detection=True,
11 redact_pii=True,
12 redact_pii_policies=[
13 PIIRedactionPolicy.person_name,
14 PIIRedactionPolicy.credit_card_number,
15 PIIRedactionPolicy.us_social_security_number,
16 PIIRedactionPolicy.account_number,
17 ],
18 redact_pii_sub=PIISubstitutionPolicy.hash,
19)
20
21# Submit job and return immediately (non-blocking)
22transcript = transcriber.submit(audio_url, config=config)
23print(f"Job submitted: {transcript.id}")
24# Your app continues processing other calls

Webhook handler example:

1from flask import Flask, request, jsonify
2
3app = Flask(__name__)
4
5@app.route("/webhooks/assemblyai", methods=["POST"])
6def assemblyai_webhook():
7 if request.headers.get("X-Webhook-Secret") != "your_secret_here":
8 return jsonify({"error": "Unauthorized"}), 401
9
10 import requests as http_requests
11
12 data = request.json
13 transcript_id = data["transcript_id"]
14 status = data["status"]
15
16 if status == "completed":
17 # Fetch the full transcript (webhook only sends transcript_id and status)
18 transcript = http_requests.get(
19 f"https://api.assemblyai.com/v2/transcript/{transcript_id}",
20 headers={"authorization": "your_api_key"}
21 ).json()
22 process_completed_call(transcript)
23 elif status == "error":
24 log_transcription_error(transcript_id)
25
26 return jsonify({"received": True}), 200
27
28def process_completed_call(transcript):
29 """Process completed call transcript and push to CRM"""
30 utterances = transcript["utterances"]
31 summary = transcript["summary"]
32
33 # Store in database
34 save_to_database(transcript)
35
36 # Push summary to CRM
37 push_to_crm(transcript["id"], summary)
38
39 # Run QA scoring
40 qa_score = score_call_quality(utterances)
41 save_qa_score(transcript["id"], qa_score)

Scaling Considerations

  • Rate limits: 20,000 POST requests per 5-minute window
  • Concurrent transcriptions: 200+ for paid accounts (queued beyond that)
  • Ramp up gradually - Start at 10-50 concurrent, double incrementally
  • Use exponential backoff with jitter for 429 errors
  • Contact Sales before large-scale rollouts

How Do I Handle PII and Compliance?

PII redaction is critical for contact center compliance (HIPAA, PCI-DSS, GDPR, CCPA).

1config = aai.TranscriptionConfig(
2 redact_pii=True,
3 redact_pii_policies=[
4 # Customer identity
5 PIIRedactionPolicy.person_name,
6 PIIRedactionPolicy.date_of_birth,
7 PIIRedactionPolicy.us_social_security_number,
8
9 # Contact information
10 PIIRedactionPolicy.phone_number,
11 PIIRedactionPolicy.email_address,
12 PIIRedactionPolicy.location,
13
14 # Financial information (PCI-DSS)
15 PIIRedactionPolicy.credit_card_number,
16 PIIRedactionPolicy.credit_card_cvv,
17 PIIRedactionPolicy.credit_card_expiration,
18 PIIRedactionPolicy.account_number,
19 PIIRedactionPolicy.banking_information,
20 ],
21 redact_pii_sub=PIISubstitutionPolicy.hash, # Stable hash tokens
22 redact_pii_audio=True, # Create de-identified audio file
23)

Why hash substitution?

  • Stable across the file (same value = same token)
  • Maintains sentence structure for downstream LLM processing
  • Prevents reconstruction of original data

HIPAA Compliance

  • AssemblyAI provides a Business Associate Agreement (BAA) at no cost
  • Contact us to execute a BAA before processing PHI
  • Use PII redaction with audio de-identification for full compliance

How Do I Improve the Accuracy of My Contact Center Transcription?

Prompting Best Practices

The most impactful lever for contact center accuracy is prompting. Use a structured prompt with a Context: field:

1config = aai.TranscriptionConfig(
2 speech_models=["universal-3-pro", "universal-2"],
3
4 # Natural language prompt for transcription guidance
5 prompt=(
6 "Transcribe this audio with perfect punctuation and formatting. "
7 "Preserve linguistic speech patterns including disfluencies, filler words, "
8 "hesitations, repetitions, stutters, false starts, and colloquialisms. "
9 "Transcribe in the original language mix (code-switching), preserving the "
10 "words in the language they are spoken. Output plain transcript text only. "
11 "Use a new line when the voice changes; each line contains only one "
12 "person's words.\n\n"
13 "Context: Acme Corp customer service call, recorded line, "
14 "Agent: Sarah Johnson, calls are monitored and recorded"
15 ),
16
17 # Keyterms for proper nouns and domain vocabulary
18 keyterms_prompt=[
19 "Acme Corp",
20 "Sarah Johnson",
21 "Premium Support Plan",
22 "Enterprise License",
23 "recorded line",
24 "calls are monitored and recorded",
25 ],
26
27 speaker_labels=True,
28)

Tips for effective prompting:

  • Use positive instructions (“transcribe verbatim”) not negative (“do NOT summarize”)
  • Keep prompts to 3-6 instructions maximum - conflicting instructions degrade output
  • Layer instructions one by one and test each to measure impact
  • Dynamize the Context: line per call with known info: company name, agent name, compliance phrases
  • Use keyterms for proper nouns and domain vocabulary (company names, product names, agent names)

Using Keyterms for Pre-recorded Transcription

1# Build keyterms dynamically per call
2call_keyterms = [
3 # Company terms (static)
4 "Acme Corp",
5 "Premium Support Plan",
6
7 # Agent name (from routing system)
8 agent_name,
9
10 # Customer name (from CRM lookup)
11 customer_name,
12
13 # Account-specific terms
14 "account ending in 4532",
15]
16
17config = aai.TranscriptionConfig(
18 speech_models=["universal-3-pro", "universal-2"],
19 keyterms_prompt=call_keyterms,
20 speaker_labels=True,
21)

Using Keyterms for Streaming

1# Streaming with contact center context
2keyterms = [
3 "Acme Corp",
4 "Premium Support Plan",
5 "Sarah Johnson",
6 "recorded line",
7]
8
9CONNECTION_PARAMS = {
10 "sample_rate": 8000,
11 "speech_model": "u3-rt-pro",
12 "format_turns": True,
13 "encoding": "pcm_mulaw",
14 "keyterms_prompt": keyterms,
15}

What Workflows Can I Build for My Contact Center Application?

Use these features to transform raw call transcripts into actionable insights.

Summarization

summarization: true

What it does: Generates an abstractive recap of the call. Output: summary string (bullets/paragraph format). Great for: Post-call CRM updates, call recaps, supervisor review.

1config = aai.TranscriptionConfig(
2 summarization=True,
3 summary_type="bullets", # or "bullets_verbose", "gist", "headline", "paragraph"
4 summary_model="informative", # or "conversational"
5)

Sentiment Analysis

sentiment_analysis: true

What it does: Scores per-utterance sentiment (positive / neutral / negative). Output: Array of { text, sentiment, confidence, start, end }. Great for: Customer satisfaction tracking, escalation detection, QA scoring.

1# Analyze customer sentiment across a call
2negative_count = 0
3for result in transcript.sentiment_analysis_results:
4 if result.sentiment == "NEGATIVE":
5 negative_count += 1
6 print(f"Negative at {result.start / 1000:.1f}s: {result.text}")
7
8# Flag calls with high negative sentiment
9if negative_count > 3:
10 flag_for_supervisor_review(transcript.id)

Entity Detection

entity_detection: true

What it does: Extracts named entities (people, organizations, locations, products, etc.). Output: Array of { entity_type, text, start, end }. Great for: CRM enrichment, auto-tagging topics, competitor tracking.

1# Extract key entities from a call
2organizations = [e.text for e in transcript.entities if e.entity_type == "organization"]
3print(f"Companies mentioned: {', '.join(organizations)}")

Speaker Identification

Map generic speaker labels to agent and customer names:

1config = aai.TranscriptionConfig(
2 speech_models=["universal-3-pro", "universal-2"],
3 speaker_labels=True,
4 speech_understanding={
5 "request": {
6 "speaker_identification": {
7 "speaker_type": "role",
8 "speakers": [
9 {"role": "Agent", "name": "Sarah Johnson", "description": "Customer service representative"},
10 {"role": "Customer"}
11 ]
12 }
13 }
14 }
15)

Translation

Translate call transcripts for international teams:

1config = aai.TranscriptionConfig(
2 speech_models=["universal-3-pro", "universal-2"],
3 language_detection=True,
4 speaker_labels=True,
5 speech_understanding={
6 "request": {
7 "translation": {
8 "target_languages": ["en"], # Translate to English
9 "match_original_utterance": True, # Per-utterance translations
10 "formal": True
11 }
12 }
13 }
14)

Redact PII Text and Audio

1config = aai.TranscriptionConfig(
2 redact_pii=True,
3 redact_pii_policies=[
4 PIIRedactionPolicy.person_name,
5 PIIRedactionPolicy.credit_card_number,
6 PIIRedactionPolicy.us_social_security_number,
7 PIIRedactionPolicy.account_number,
8 ],
9 redact_pii_sub=PIISubstitutionPolicy.hash,
10 redact_pii_audio=True, # Generate de-identified audio
11)
12
13# After transcription
14print(transcript.text) # PII redacted in text
15print(transcript.redacted_audio_url) # PII bleeped in audio

How Do I Process the Response from the API?

Processing Pre-recorded Responses

1def process_call_transcript(transcript):
2 """
3 Extract and process all relevant data from a pre-recorded call transcript
4 """
5 call_data = {
6 "id": transcript.id,
7 "duration": transcript.audio_duration, # Already in seconds
8 "confidence": transcript.confidence,
9 "full_text": transcript.text,
10 }
11
12 # Process speaker utterances
13 speakers = {}
14 for utterance in transcript.utterances:
15 speaker = utterance.speaker
16
17 if speaker not in speakers:
18 speakers[speaker] = {
19 "utterances": [],
20 "total_speaking_time": 0,
21 "word_count": 0
22 }
23
24 speakers[speaker]["utterances"].append({
25 "text": utterance.text,
26 "start": utterance.start,
27 "end": utterance.end,
28 })
29
30 speakers[speaker]["total_speaking_time"] += (utterance.end - utterance.start) / 1000
31 speakers[speaker]["word_count"] += len(utterance.text.split())
32
33 call_data["speakers"] = speakers
34
35 # Extract summary
36 if transcript.summary:
37 call_data["summary"] = transcript.summary
38
39 # Analyze sentiment
40 if transcript.sentiment_analysis_results:
41 sentiments = [r.sentiment for r in transcript.sentiment_analysis_results]
42 call_data["sentiment_breakdown"] = {
43 "positive": sentiments.count("POSITIVE"),
44 "neutral": sentiments.count("NEUTRAL"),
45 "negative": sentiments.count("NEGATIVE"),
46 }
47
48 # Calculate statistics
49 total_duration = transcript.audio_duration
50 call_data["statistics"] = {
51 "total_speakers": len(speakers),
52 "total_words": sum(s["word_count"] for s in speakers.values()),
53 "speaking_distribution": {
54 speaker: {
55 "percentage": (data["total_speaking_time"] / total_duration) * 100,
56 "minutes": data["total_speaking_time"] / 60,
57 }
58 for speaker, data in speakers.items()
59 },
60 }
61
62 return call_data
63
64result = process_call_transcript(transcript)
65print(f"Call had {result['statistics']['total_speakers']} speakers")
66print(f"Sentiment: {result.get('sentiment_breakdown', {})}")

Additional Resources