Insights & Use Cases
April 13, 2026

Build a call center analytics pipeline in Python with AssemblyAI

Learn to build a Python call center analytics pipeline using AssemblyAI's Voice AI. Automatically transcribe audio, identify speakers, analyze sentiment, and create data visualizations from call recordings.

Kelsey Foster
Growth
Reviewed by
No items found.
Table of contents

Call centers generate thousands of hours of audio daily, but most valuable information remains locked in unstructured recordings. Voice AI transforms these conversations into actionable insights—revealing customer sentiment patterns, identifying common issues, and improving agent performance. This shift is part of a larger trend, with a Gartner prediction stating that by 2025, 60% of organizations will analyze voice and text interactions to supplement traditional surveys.

In this tutorial, you'll build a complete call center analytics pipeline that automatically transcribes audio recordings, identifies speakers, analyzes sentiment, and creates data visualizations—the same core workflow used by conversation intelligence platforms built on AssemblyAI. We'll use AssemblyAI's Voice AI models to handle the audio processing, then structure and visualize the results using Python.

Understanding call center analytics

Call center analytics is the systematic analysis of customer interactions—calls, transcripts, and conversation data—to extract actionable insights on agent performance, customer sentiment, and operational efficiency.

Modern pipelines automate this process entirely, applying speech-to-text and AI models to every call rather than a sampled subset. This is a significant improvement over traditional methods, where internal analysis shows managers could typically only review about 1-3% of all calls.

The core components of a call center analytics system include:

  • Speech-to-text transcription—Converting audio recordings into searchable, analyzable text
  • Speaker diarization—Identifying who said what during the conversation
  • Sentiment analysis—Detecting emotional tone throughout the call
  • Data visualization—Presenting insights in formats that drive action

What you'll build

By the end of this guide, you'll have a working system that can:

  • Transcribe call center recordings with speaker diarization
  • Map generic speaker labels to actual names using AI
  • Perform sentiment analysis on each conversation segment
  • Generate interactive heatmap visualizations showing sentiment patterns
  • Export structured data for further analysis

The complete workflow transforms raw audio into structured insights that call center managers can use to improve operations, a benefit confirmed by a 2025 survey which found that 69% of companies report improved customer service after implementing conversation intelligence.

Prerequisites and setup

System requirements

  • Python 3.7 or higher
  • Jupyter notebook environment (Google Colab recommended)
  • Internet connection for API calls

Get your AssemblyAI API key

New users receive $50 in free credits, covering this tutorial and initial experimentation.

  1. Visit the AssemblyAI dashboard and create a free account
  2. Navigate to the API Keys section in the left sidebar
  3. Click Create new API key and give it a descriptive name
  4. Copy the generated API key—you'll need this in the next step

Store this key securely since you'll be using it throughout the tutorial.

Clone the GitHub repository

The tutorial uses sample audio files and a complete Jupyter notebook from the official repository:

git clone https://github.com/dataprofessor/assemblyai
cd assemblyai

The repository contains:

  • 04-call-center-analytics.ipynb - The main tutorial notebook
  • Sample audio files for testing
  • Additional examples and utilities

If you prefer working directly in Google Colab, you can download the notebook file and upload it to your Colab environment.

Explore call analytics without code

Upload a call recording to preview high-quality transcription and speaker diarization in your browser. Validate results before running the full notebook.

Open playground

Install required dependencies

The tutorial primarily uses AssemblyAI's Python SDK, with most other libraries already available in standard Python environments:

pip install assemblyai

Additional libraries we'll use (typically pre-installed in Jupyter environments):

  • pandas - Data manipulation and analysis
  • altair - Data visualization
  • spacy - Natural language processing
  • IPython - Audio playback widgets

Understanding analytics approaches for call centers

When building an analytics pipeline, you generally choose between two processing approaches: batch and real-time.

Approach

Best for

Key features

Batch processing

Deep analysis of recorded calls

Sentiment analysis, topic detection, comprehensive summaries

Real-time processing

Live agent assistance

Streaming transcription, live alerts, knowledge base surfacing

Choose batch processing when you need deep analysis—sentiment trends, topic detection, full call summaries. Choose real-time processing when agents need live assistance during the call. This tutorial uses batch processing. For real-time pipelines, see AssemblyAI's Streaming Speech-to-Text documentation.

Setting up the analytics pipeline

Configure API authentication

First, set up secure access to your AssemblyAI API key. In Google Colab, use the secrets manager to store your credentials safely:

import assemblyai as aai
from google.colab import userdata

# Load API key from Colab secrets
aai_key = userdata.get('AI_KEY')
aai.settings.api_key = aai_key

For local development, set the API key directly (use environment variables in production):

import assemblyai as aai

aai.settings.api_key = "your_api_key_here"

Load and preview the audio data

The sample audio file contains a realistic call center conversation between a customer service agent and a satisfied customer. This gives us both positive and neutral sentiment data to work with:

from IPython.display import display, Audio

# Load audio file from the repository
audio_input = "https://github.com/dataprofessor/assemblyai/raw/refs/heads/master/call-
center-recording.wav"

# Hear the audio
display(Audio(audio_input))

The conversation features Sarah (customer service agent) speaking with Michael Johnson (satisfied electric vehicle owner) providing positive feedback about his purchase experience.

Configure transcription parameters

AssemblyAI's transcription service offers several advanced features beyond basic speech-to-text. For call center analytics, we need to enable speaker diarization, sentiment analysis, and speaker identification to automatically assign role labels to speakers. We also must specify which speech-to-text model to use.

    speech_models=["universal-3-pro", "universal-2"],
    speaker_labels=True,
    sentiment_analysis=True,
    speech_understanding={
        "request": {
            "speaker_identification": {
                "speaker_type": "role",
                "known_values": ["Agent", "Customer"]
            }
        }
    }
)

These configuration options enable the AI models to automatically detect speaker changes, analyze emotional tone, and assign role labels to the correct speakers in the conversation. The speech_models parameter is required for all transcription requests.

Transcribing and processing the audio

Perform the transcription

With the configuration set, we can now transcribe the audio file. AssemblyAI's AI models handle complex audio processing automatically, with transcription typically completing in under 45 seconds:

transcriber = aai.Transcriber()
transcript = transcriber.transcribe(audio_input, config=config)

# Check transcription status
print(f"Call duration: {transcript.audio_duration} seconds")
print(f"Total words: {len(transcript.words)}")

The transcript object contains rich metadata including word-level timestamps, confidence scores, and identified speaker roles.

Process the transcript with speaker roles

Because we used Speaker Identification, the transcript's utterances now use role labels (Agent, Customer) instead of generic labels like "A" and "B". We can now iterate through the utterances and print the identified speaker and their dialogue.

for utterance in transcript.utterances:
    print(f"{utterance.speaker}: {utterance.text}")

This creates a clean, readable conversation format with role-based speaker identification, ready for analysis.

Build call analytics with AssemblyAI

Create a free account to get your API key and $50 in credits. Transcribe calls with diarization, speaker names, and sentiment at scale.

Get API key

Performing sentiment analysis

Extract sentiment data

AssemblyAI's sentiment analysis runs automatically when enabled in the configuration. Results provide sentence-level sentiment classification:

# Access sentiment analysis results
sentiment_results = transcript.sentiment_analysis

# Preview sentiment data structure
for i, result in enumerate(sentiment_results[:3]):
    print(f"Segment {i+1}:")
    print(f"  Speaker: {result.speaker}")
    print(f"  Text: {result.text}")
    print(f"  Sentiment: {result.sentiment}")
    print(f"  Confidence: {result.confidence:.2f}")
    print()

Each segment includes the speaker, text content, sentiment classification (positive/neutral/negative), and a confidence score for the analysis.

Structure data for analysis

Convert the sentiment results into a structured format that's easier to analyze and visualize:

import pandas as pd

# Create structured dataframe
sentiment_data = []
for result in sentiment_results:
    sentiment_data.append({
        'speaker': result.speaker,
        'text': result.text,
        'sentiment': result.sentiment,
        'confidence': result.confidence
    })

df = pd.DataFrame(sentiment_data)
print(f"Created dataframe with {len(df)} conversation segments")
print(df['sentiment'].value_counts())

This DataFrame structure makes it easy to perform aggregate analysis and create visualizations.

Creating data visualizations

Generate sentiment overview heatmap

Access sentiment results directly from the transcript object using the sentiment_analysis attribute:

transcript.sentiment_analysis
# Create a DataFrame of Speaker and Sentiment
data = []
index_value = 0  # Initialize an index counter

for sentiment in transcript.sentiment_analysis:
    speaker = sentiment.speaker  # Returns generic labels (A, B); map using speech_understanding.response.speaker_identification.mapping
    sentiment_value = sentiment.sentiment.value
    text = sentiment.text
    data.append({'speaker': speaker, 'sentiment': sentiment_value, 
'text': text, 'index': index_value})
    index_value += 1  # Increment the index

df = pd.DataFrame(data)

Here, we'll count the occurrences of each speaker-sentiment combination:

# Count the occurrences of each speaker-sentiment combination
import altair as alt

heatmap_data = df.groupby(['speaker', 
'sentiment']).size().reset_index(name='count')

font_size = 14

# Create the base chart
base = alt.Chart(heatmap_data).encode(
    x=alt.X('speaker', axis=alt.Axis(title='Speaker',
titleFontSize=font_size, labelFontSize=font_size)),
    y=alt.Y('sentiment', axis=alt.Axis(title='Sentiment',
titleFontSize=font_size, labelFontSize=font_size))
)

# Create the heatmap rectangles
heatmap = base.mark_rect().encode(
    color=alt.Color('count', title='Count', 
scale=alt.Scale(range='heatmap')),
    tooltip=['speaker', 'sentiment', 'count']
)

# Add the text labels
text = base.mark_text(fontSize=font_size, fontWeight='bold').encode(
    text=alt.Text('count'),
    color=alt.condition(
        alt.datum.count > heatmap_data['count'].max() / 2,
        alt.value('white'),
        alt.value('black')
    )
)

# Combine the heatmap and text
chart = (heatmap + text).properties(
    width=300,
    height=300
).interactive()

The chart renders a heatmap of sentiment counts per speaker:

chart

This heatmap provides a quick overview of conversation dynamics, showing how much positive, neutral, or negative sentiment each speaker expressed.

Heatmap of sentiment analysis

For deeper analysis, we can zoom into the individual sentences and see the sentiment for sequences of words as spoken in the transcript.

font_size = 12

# Define the color scale for sentiment
sentiment_colors = {
    'POSITIVE': '#4CAF50',  # Green
    'NEUTRAL': '#9E9E9E',   # Gray
    'NEGATIVE': '#F44336'   # Red
}

# Create the base chart
base = alt.Chart(df).encode(
    x=alt.X('speaker:N', axis=alt.Axis(title='Speaker', 
titleFontSize=font_size, labelFontSize=font_size)),
    y=alt.Y('index:O', axis=alt.Axis(title=None, labels=False))
)

# Create the heatmap rectangles with black stroke (border)
heatmap = base.mark_rect(stroke='black').encode(
    color=alt.Color(
        'sentiment:N',
        scale=alt.Scale(domain=list(sentiment_colors.keys()), 
range=list(sentiment_colors.values())),
        legend=alt.Legend(orient='bottom')
    ),
    tooltip=['speaker:N', 'sentiment:N', 'text:N']
).properties(
    width=200,
    height=df.shape[0] * 20
)

# Add the text column to the left of the chart and hide its y-axis
text_right = alt.Chart(df).mark_text(align='left', baseline='middle',
dx=5).encode(
    y=alt.Y('index:O', axis=None),
    text=alt.Text('text:N'),
    color=alt.value('black')
).properties(
    width=10,
    height=df.shape[0] * 20
)

# Combine the heatmap and the text
chart = alt.concat(
    heatmap,
    text_right
).properties(
).configure_axis(
    labelFontSize=font_size,
    titleFontSize=font_size
).configure_view(
    strokeOpacity=0
).interactive()

chart

Integration and deployment considerations

Moving from a local script to a production environment requires planning for scale and reliability. Here are the key areas to address:

Asynchronous processing with webhooks

When processing thousands of calls daily, implement webhooks rather than polling the API to receive transcription results efficiently. This asynchronous approach prevents timeouts and reduces server load:

config = aai.TranscriptionConfig(
    speaker_labels=True,
    sentiment_analysis=True,
    webhook_url="https://your-server.com/webhook"
)

Error handling and retry logic

Ensure your pipeline accounts for corrupted audio files or network interruptions by implementing retry logic with exponential backoff:

import time

def transcribe_with_retry(audio_url, config, max_retries=3):
    for attempt in range(max_retries):
        try:
            transcriber = aai.Transcriber()
            transcript = transcriber.transcribe(audio_url,
config=config)
            if transcript.status == aai.TranscriptStatus.error:
                raise Exception(f"Transcription failed:
{transcript.error}")
            return transcript
        except Exception as e:
            if attempt < max_retries - 1:
                wait_time = 2 ** attempt
                time.sleep(wait_time)
            else:
                raise e

Data privacy and compliance

Handling sensitive customer data is a top priority, as research shows that over 30% of product leaders cite data privacy as a significant challenge. To that end, AssemblyAI enables covered entities and their business associates subject to HIPAA to use the AssemblyAI services to process protected health information (PHI).

AssemblyAI is considered a business associate under HIPAA, and we offer a Business Associate Addendum (BAA) that is required under HIPAA to ensure that AAI appropriately safeguards PHI.

Scale HIPAA-ready call analytics

Talk with our team about webhooks, retries, and secure deployment for processing thousands of daily calls. We support HIPAA with a BAA for covered entities.

Talk to AI expert

Key takeaways

You now have a working pipeline that turns raw call recordings into structured, queryable data. From here, you can extend this foundation to track sentiment trends over time, benchmark agent performance across cohorts, or trigger automated alerts when negative sentiment spikes. These capabilities have a direct impact on customer experience, with a recent market survey finding that over 70% of companies reported measurable increases in end-user satisfaction after implementing conversation intelligence.

The complete code and sample files are in the GitHub repository. New accounts get $50 in free credits to get started.

Frequently asked questions about call center analytics implementation

How do I handle audio files larger than standard limits?

AssemblyAI accepts files up to 5GB and audio up to 10 hours. For larger files, compress to MP3 or reduce the sample rate before submitting—this has minimal impact on transcription accuracy.

What happens if speaker identification fails or is inaccurate?

The most effective fix is using stereo audio with each speaker on a separate channel. Enable this by setting multichannel=True in your TranscriptionConfig. When using multichannel, you should also set speaker_labels=False, as the channels provide perfect speaker separation. Crosstalk and heavy audio compression are the most common causes of diarization errors in single-channel audio.

How can I integrate this pipeline with existing call center systems?

Most call center platforms (Five9, Genesys, Twilio) expose APIs to export recordings—connect these to a webhook listener that triggers transcription automatically when a new call is saved. Use AssemblyAI webhooks rather than polling to handle transcription results asynchronously at scale.

What are the four types of analytics used in call centers?

The four types are: descriptive (what happened), diagnostic (why it happened), predictive (what will happen), and prescriptive (what to do about it). This tutorial covers descriptive and diagnostic analytics through sentiment analysis and speaker identification.

Get a free API key

Access AssemblyAI's API with $50 in free credits

Get Free API Key
Title goes here

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Button Text
Call Centers
Python
Tutorial