Build a call center analytics pipeline in Python with AssemblyAI
Learn to build a Python call center analytics pipeline using AssemblyAI's Voice AI. Automatically transcribe audio, identify speakers, analyze sentiment, and create data visualizations from call recordings.



Call centers generate thousands of hours of audio daily, but most valuable information remains locked in unstructured recordings. Voice AI transforms these conversations into actionable insights—revealing customer sentiment patterns, identifying common issues, and improving agent performance. This shift is part of a larger trend, with a Gartner prediction stating that by 2025, 60% of organizations will analyze voice and text interactions to supplement traditional surveys.
In this tutorial, you'll build a complete call center analytics pipeline that automatically transcribes audio recordings, identifies speakers, analyzes sentiment, and creates data visualizations—the same core workflow used by conversation intelligence platforms built on AssemblyAI. We'll use AssemblyAI's Voice AI models to handle the audio processing, then structure and visualize the results using Python.
Understanding call center analytics
Call center analytics is the systematic analysis of customer interactions—calls, transcripts, and conversation data—to extract actionable insights on agent performance, customer sentiment, and operational efficiency.
Modern pipelines automate this process entirely, applying speech-to-text and AI models to every call rather than a sampled subset. This is a significant improvement over traditional methods, where internal analysis shows managers could typically only review about 1-3% of all calls.
The core components of a call center analytics system include:
- Speech-to-text transcription—Converting audio recordings into searchable, analyzable text
- Speaker diarization—Identifying who said what during the conversation
- Sentiment analysis—Detecting emotional tone throughout the call
- Data visualization—Presenting insights in formats that drive action
What you'll build
By the end of this guide, you'll have a working system that can:
- Transcribe call center recordings with speaker diarization
- Map generic speaker labels to actual names using AI
- Perform sentiment analysis on each conversation segment
- Generate interactive heatmap visualizations showing sentiment patterns
- Export structured data for further analysis
The complete workflow transforms raw audio into structured insights that call center managers can use to improve operations, a benefit confirmed by a 2025 survey which found that 69% of companies report improved customer service after implementing conversation intelligence.
Prerequisites and setup
System requirements
- Python 3.7 or higher
- Jupyter notebook environment (Google Colab recommended)
- Internet connection for API calls
Get your AssemblyAI API key
New users receive $50 in free credits, covering this tutorial and initial experimentation.
- Visit the AssemblyAI dashboard and create a free account
- Navigate to the API Keys section in the left sidebar
- Click Create new API key and give it a descriptive name
- Copy the generated API key—you'll need this in the next step
Store this key securely since you'll be using it throughout the tutorial.
Clone the GitHub repository
The tutorial uses sample audio files and a complete Jupyter notebook from the official repository:
git clone https://github.com/dataprofessor/assemblyai
cd assemblyaiThe repository contains:
- 04-call-center-analytics.ipynb - The main tutorial notebook
- Sample audio files for testing
- Additional examples and utilities
If you prefer working directly in Google Colab, you can download the notebook file and upload it to your Colab environment.
Install required dependencies
The tutorial primarily uses AssemblyAI's Python SDK, with most other libraries already available in standard Python environments:
pip install assemblyaiAdditional libraries we'll use (typically pre-installed in Jupyter environments):
- pandas - Data manipulation and analysis
- altair - Data visualization
- spacy - Natural language processing
- IPython - Audio playback widgets
Understanding analytics approaches for call centers
When building an analytics pipeline, you generally choose between two processing approaches: batch and real-time.
Choose batch processing when you need deep analysis—sentiment trends, topic detection, full call summaries. Choose real-time processing when agents need live assistance during the call. This tutorial uses batch processing. For real-time pipelines, see AssemblyAI's Streaming Speech-to-Text documentation.
Setting up the analytics pipeline
Configure API authentication
First, set up secure access to your AssemblyAI API key. In Google Colab, use the secrets manager to store your credentials safely:
import assemblyai as aai
from google.colab import userdata
# Load API key from Colab secrets
aai_key = userdata.get('AI_KEY')
aai.settings.api_key = aai_keyFor local development, set the API key directly (use environment variables in production):
import assemblyai as aai
aai.settings.api_key = "your_api_key_here"Load and preview the audio data
The sample audio file contains a realistic call center conversation between a customer service agent and a satisfied customer. This gives us both positive and neutral sentiment data to work with:
from IPython.display import display, Audio
# Load audio file from the repository
audio_input = "https://github.com/dataprofessor/assemblyai/raw/refs/heads/master/call-
center-recording.wav"
# Hear the audio
display(Audio(audio_input))The conversation features Sarah (customer service agent) speaking with Michael Johnson (satisfied electric vehicle owner) providing positive feedback about his purchase experience.
Configure transcription parameters
AssemblyAI's transcription service offers several advanced features beyond basic speech-to-text. For call center analytics, we need to enable speaker diarization, sentiment analysis, and speaker identification to automatically assign role labels to speakers. We also must specify which speech-to-text model to use.
speech_models=["universal-3-pro", "universal-2"],
speaker_labels=True,
sentiment_analysis=True,
speech_understanding={
"request": {
"speaker_identification": {
"speaker_type": "role",
"known_values": ["Agent", "Customer"]
}
}
}
)These configuration options enable the AI models to automatically detect speaker changes, analyze emotional tone, and assign role labels to the correct speakers in the conversation. The speech_models parameter is required for all transcription requests.
Transcribing and processing the audio
Perform the transcription
With the configuration set, we can now transcribe the audio file. AssemblyAI's AI models handle complex audio processing automatically, with transcription typically completing in under 45 seconds:
transcriber = aai.Transcriber()
transcript = transcriber.transcribe(audio_input, config=config)
# Check transcription status
print(f"Call duration: {transcript.audio_duration} seconds")
print(f"Total words: {len(transcript.words)}")The transcript object contains rich metadata including word-level timestamps, confidence scores, and identified speaker roles.
Process the transcript with speaker roles
Because we used Speaker Identification, the transcript's utterances now use role labels (Agent, Customer) instead of generic labels like "A" and "B". We can now iterate through the utterances and print the identified speaker and their dialogue.
for utterance in transcript.utterances:
print(f"{utterance.speaker}: {utterance.text}")This creates a clean, readable conversation format with role-based speaker identification, ready for analysis.
Performing sentiment analysis
Extract sentiment data
AssemblyAI's sentiment analysis runs automatically when enabled in the configuration. Results provide sentence-level sentiment classification:
# Access sentiment analysis results
sentiment_results = transcript.sentiment_analysis
# Preview sentiment data structure
for i, result in enumerate(sentiment_results[:3]):
print(f"Segment {i+1}:")
print(f" Speaker: {result.speaker}")
print(f" Text: {result.text}")
print(f" Sentiment: {result.sentiment}")
print(f" Confidence: {result.confidence:.2f}")
print()Each segment includes the speaker, text content, sentiment classification (positive/neutral/negative), and a confidence score for the analysis.
Structure data for analysis
Convert the sentiment results into a structured format that's easier to analyze and visualize:
import pandas as pd
# Create structured dataframe
sentiment_data = []
for result in sentiment_results:
sentiment_data.append({
'speaker': result.speaker,
'text': result.text,
'sentiment': result.sentiment,
'confidence': result.confidence
})
df = pd.DataFrame(sentiment_data)
print(f"Created dataframe with {len(df)} conversation segments")
print(df['sentiment'].value_counts())This DataFrame structure makes it easy to perform aggregate analysis and create visualizations.
Creating data visualizations
Generate sentiment overview heatmap
Access sentiment results directly from the transcript object using the sentiment_analysis attribute:
transcript.sentiment_analysis# Create a DataFrame of Speaker and Sentiment
data = []
index_value = 0 # Initialize an index counter
for sentiment in transcript.sentiment_analysis:
speaker = sentiment.speaker # Returns generic labels (A, B); map using speech_understanding.response.speaker_identification.mapping
sentiment_value = sentiment.sentiment.value
text = sentiment.text
data.append({'speaker': speaker, 'sentiment': sentiment_value,
'text': text, 'index': index_value})
index_value += 1 # Increment the index
df = pd.DataFrame(data)Here, we'll count the occurrences of each speaker-sentiment combination:
# Count the occurrences of each speaker-sentiment combination
import altair as alt
heatmap_data = df.groupby(['speaker',
'sentiment']).size().reset_index(name='count')
font_size = 14
# Create the base chart
base = alt.Chart(heatmap_data).encode(
x=alt.X('speaker', axis=alt.Axis(title='Speaker',
titleFontSize=font_size, labelFontSize=font_size)),
y=alt.Y('sentiment', axis=alt.Axis(title='Sentiment',
titleFontSize=font_size, labelFontSize=font_size))
)
# Create the heatmap rectangles
heatmap = base.mark_rect().encode(
color=alt.Color('count', title='Count',
scale=alt.Scale(range='heatmap')),
tooltip=['speaker', 'sentiment', 'count']
)
# Add the text labels
text = base.mark_text(fontSize=font_size, fontWeight='bold').encode(
text=alt.Text('count'),
color=alt.condition(
alt.datum.count > heatmap_data['count'].max() / 2,
alt.value('white'),
alt.value('black')
)
)
# Combine the heatmap and text
chart = (heatmap + text).properties(
width=300,
height=300
).interactive()The chart renders a heatmap of sentiment counts per speaker:
chartThis heatmap provides a quick overview of conversation dynamics, showing how much positive, neutral, or negative sentiment each speaker expressed.
Heatmap of sentiment analysis
For deeper analysis, we can zoom into the individual sentences and see the sentiment for sequences of words as spoken in the transcript.
font_size = 12
# Define the color scale for sentiment
sentiment_colors = {
'POSITIVE': '#4CAF50', # Green
'NEUTRAL': '#9E9E9E', # Gray
'NEGATIVE': '#F44336' # Red
}
# Create the base chart
base = alt.Chart(df).encode(
x=alt.X('speaker:N', axis=alt.Axis(title='Speaker',
titleFontSize=font_size, labelFontSize=font_size)),
y=alt.Y('index:O', axis=alt.Axis(title=None, labels=False))
)
# Create the heatmap rectangles with black stroke (border)
heatmap = base.mark_rect(stroke='black').encode(
color=alt.Color(
'sentiment:N',
scale=alt.Scale(domain=list(sentiment_colors.keys()),
range=list(sentiment_colors.values())),
legend=alt.Legend(orient='bottom')
),
tooltip=['speaker:N', 'sentiment:N', 'text:N']
).properties(
width=200,
height=df.shape[0] * 20
)
# Add the text column to the left of the chart and hide its y-axis
text_right = alt.Chart(df).mark_text(align='left', baseline='middle',
dx=5).encode(
y=alt.Y('index:O', axis=None),
text=alt.Text('text:N'),
color=alt.value('black')
).properties(
width=10,
height=df.shape[0] * 20
)
# Combine the heatmap and the text
chart = alt.concat(
heatmap,
text_right
).properties(
).configure_axis(
labelFontSize=font_size,
titleFontSize=font_size
).configure_view(
strokeOpacity=0
).interactive()
chartIntegration and deployment considerations
Moving from a local script to a production environment requires planning for scale and reliability. Here are the key areas to address:
Asynchronous processing with webhooks
When processing thousands of calls daily, implement webhooks rather than polling the API to receive transcription results efficiently. This asynchronous approach prevents timeouts and reduces server load:
config = aai.TranscriptionConfig(
speaker_labels=True,
sentiment_analysis=True,
webhook_url="https://your-server.com/webhook"
)Error handling and retry logic
Ensure your pipeline accounts for corrupted audio files or network interruptions by implementing retry logic with exponential backoff:
import time
def transcribe_with_retry(audio_url, config, max_retries=3):
for attempt in range(max_retries):
try:
transcriber = aai.Transcriber()
transcript = transcriber.transcribe(audio_url,
config=config)
if transcript.status == aai.TranscriptStatus.error:
raise Exception(f"Transcription failed:
{transcript.error}")
return transcript
except Exception as e:
if attempt < max_retries - 1:
wait_time = 2 ** attempt
time.sleep(wait_time)
else:
raise eData privacy and compliance
Handling sensitive customer data is a top priority, as research shows that over 30% of product leaders cite data privacy as a significant challenge. To that end, AssemblyAI enables covered entities and their business associates subject to HIPAA to use the AssemblyAI services to process protected health information (PHI).
AssemblyAI is considered a business associate under HIPAA, and we offer a Business Associate Addendum (BAA) that is required under HIPAA to ensure that AAI appropriately safeguards PHI.
Key takeaways
You now have a working pipeline that turns raw call recordings into structured, queryable data. From here, you can extend this foundation to track sentiment trends over time, benchmark agent performance across cohorts, or trigger automated alerts when negative sentiment spikes. These capabilities have a direct impact on customer experience, with a recent market survey finding that over 70% of companies reported measurable increases in end-user satisfaction after implementing conversation intelligence.
The complete code and sample files are in the GitHub repository. New accounts get $50 in free credits to get started.
Frequently asked questions about call center analytics implementation
How do I handle audio files larger than standard limits?
AssemblyAI accepts files up to 5GB and audio up to 10 hours. For larger files, compress to MP3 or reduce the sample rate before submitting—this has minimal impact on transcription accuracy.
What happens if speaker identification fails or is inaccurate?
The most effective fix is using stereo audio with each speaker on a separate channel. Enable this by setting multichannel=True in your TranscriptionConfig. When using multichannel, you should also set speaker_labels=False, as the channels provide perfect speaker separation. Crosstalk and heavy audio compression are the most common causes of diarization errors in single-channel audio.
How can I integrate this pipeline with existing call center systems?
Most call center platforms (Five9, Genesys, Twilio) expose APIs to export recordings—connect these to a webhook listener that triggers transcription automatically when a new call is saved. Use AssemblyAI webhooks rather than polling to handle transcription results asynchronously at scale.
What are the four types of analytics used in call centers?
The four types are: descriptive (what happened), diagnostic (why it happened), predictive (what will happen), and prescriptive (what to do about it). This tutorial covers descriptive and diagnostic analytics through sentiment analysis and speaker identification.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.


