August 14, 2025

How to perform speaker diarization in JavaScript

Learn speaker diarization in JavaScript with AssemblyAI's SDK. Complete guide with code examples for identifying who spoke when in audio recordings.

Tutorial

Speaker Diarization

JavaScript

Kelsey Foster

Growth

Kelsey Foster

Growth

Reviewed by

No items found.

Table of contents

[Visible on live site]

Speaker diarization answers "who spoke when?" in multi-speaker recordings by partitioning audio into segments based on speaker identity. This technology has become essential for applications like automated meeting transcriptions, podcast analysis, and customer service call processing.

This tutorial walks you through implementing speaker diarization in JavaScript with AssemblyAI's SDK—from setup to processing audio files and handling results.

Step-by-step guide to perform speaker diarization in JavaScript

Prerequisites and environment setup

Before implementing speaker diarization, ensure your development environment meets these requirements:

‍

Requirements Table

Requirement	Details
Installation	Node.js Version 14 or higher Download from nodejs.org
AssemblyAI API Key	Free account (includes free credits) Sign up at assemblyai.com
Audio files	Supported formats: MP3, WAV, FLAC, M4A Minimum 10 seconds recommended
Code editor	Any JavaScript-compatible editor Examples: VS Code, WebStorm, etc.

Get Started with Speaker Diarization

Access our production-ready speaker diarization API with 95%+ accuracy. Sign up now to get $50 in free credits.

Get a free API key

Install the AssemblyAI JavaScript SDK

Start by creating a new Node.js project and installing the required dependencies:

mkdir speaker-diarization-demo
cd speaker-diarization-demo
npm init -y
npm install assemblyai

‍

Set up your API credentials

Create a .env file in your project root to store your AssemblyAI API key:

# .env
ASSEMBLYAI_API_KEY=your_api_key_here

‍

Install the dotenv package to load environment variables:

npm install dotenv

‍

Basic speaker diarization implementation

Create a new file called diarization.js with the following implementation:

require('dotenv').config();
const { AssemblyAI } = require('assemblyai');

// Initialize the AssemblyAI client
const client = new AssemblyAI({
  apiKey: process.env.ASSEMBLYAI_API_KEY
});

async function performSpeakerDiarization(audioUrl) {
  try {
    // Configure transcription with speaker diarization
    const params = {
      audio: audioUrl,
      speaker_labels: true, // Enable speaker diarization
      speakers_expected: 2, // Optional: hint about number of speakers
    };

    // Submit transcription request
    console.log('Submitting audio for transcription...');
    const transcript = await client.transcripts.transcribe(params);

    // Check if transcription was successful
    if (transcript.status === 'error') {
      console.error('Transcription failed:', transcript.error);
      return;
    }

    // Display results
    console.log('\n--- Speaker Diarization Results ---');
    const detectedSpeakers = transcript.utterances ? 
      Math.max(...transcript.utterances.map(u =>
parseInt(u.speaker.replace('Speaker ', ''), 10))) : 'Unknown';
    console.log(`Total speakers detected: ${detectedSpeakers}`);
    
    // Process utterances with speaker labels
    if (transcript.utterances) {
      transcript.utterances.forEach((utterance, index) => {
        const startTime = formatTime(utterance.start);
        const endTime = formatTime(utterance.end);
        console.log(`\nSpeaker ${utterance.speaker}: [${startTime} - 
${endTime}]`);
        console.log(`"${utterance.text}"`);
      });
    }

    return transcript;
  } catch (error) {
    console.error('Error during speaker diarization:', error);
  }
}

// Helper function to format timestamps
function formatTime(milliseconds) {
  const seconds = Math.floor(milliseconds / 1000);
  const minutes = Math.floor(seconds / 60);
  const remainingSeconds = seconds % 60;
  return `${minutes}:${remainingSeconds.toString().padStart(2, '0')}`;
}

// Example usage
const audioUrl = 'https://assembly.ai/sports_injuries.mp3';
performSpeakerDiarization(audioUrl);

‍

Advanced configuration options

For better results, you can customize the speaker diarization configuration based on your specific use case:

‍

Configuration Options

Configuration	Purpose	Example Values
speakers_expected	Hint about expected speaker count	2, 3, 4 (or omit for auto-detection)
speaker_labels	Enable/disable diarization	true (required)
language	Specify audio language	'en', 'es', 'fr'
punctuate	Add punctuation to transcripts	true (recommended)
format_text	Format text properly	true (recommended)

Here's an enhanced implementation with additional configuration:

async function advancedSpeakerDiarization(audioUrl, options = {}) {
  const params = {
    audio: audioUrl,
    speaker_labels: true,
    speakers_expected: options.expectedSpeakers,
    language: options.language,
    punctuate: true,
    format_text: true,
    // Additional useful features
    auto_highlights: true, // Extract key phrases
    sentiment_analysis: true, // Analyze sentiment per speaker
    entity_detection: true // Detect named entities
  };

  try {
    const transcript = await client.transcripts.transcribe(params);
    
    if (transcript.status === 'error') {
      throw new Error(transcript.error);
    }

    // Process and analyze results
    const analysis = analyzeSpeakerData(transcript);
    return { transcript, analysis };
    
  } catch (error) {
    console.error('Advanced diarization failed:', error);
    throw error;
  }
}

function analyzeSpeakerData(transcript) {
  const speakerStats = {};
  
  if (!transcript.utterances) return speakerStats;
  
  transcript.utterances.forEach(utterance => {
    const speaker = utterance.speaker;
    const duration = utterance.end - utterance.start;
    const wordCount = utterance.text.split(' ').length;
    
    if (!speakerStats[speaker]) {
      speakerStats[speaker] = {
        totalTime: 0,
        utteranceCount: 0,
        totalWords: 0,
        averageWordsPerUtterance: 0
      };
    }
    
    speakerStats[speaker].totalTime += duration;
    speakerStats[speaker].utteranceCount += 1;
    speakerStats[speaker].totalWords += wordCount;
    speakerStats[speaker].averageWordsPerUtterance = 
      speakerStats[speaker].totalWords / 
speakerStats[speaker].utteranceCount;
  });
  
  return speakerStats;
}

‍

Handle local audio files

To process local audio files, you'll need to upload them first:

const fs = require('fs');

async function diarizeLocalFile(filePath) {
  try {
    // Upload local file - returns object with upload_url
    console.log('Uploading audio file...');
    const audioFile = await
client.files.upload(fs.createReadStream(filePath));
    
    // Process with speaker diarization
    const result = await
performSpeakerDiarization(audioFile.upload_url);
    return result;
    
  } catch (error) {
    console.error('Error processing local file:', error);
  }
}

// Example usage with local file
// diarizeLocalFile('./path/to/your/audio.mp3');

‍

Error handling and best practices

Implement robust error handling for production applications:

async function robustSpeakerDiarization(audioSource, options = {}) {
  const maxRetries = 3;
  let attempt = 0;
  
  while (attempt < maxRetries) {
    try {
      // Validate audio source
      if (!audioSource) {
        throw new Error('Audio source is required');
      }
      
      const params = {
        audio: audioSource,
        speaker_labels: true,
        speakers_expected: options.expectedSpeakers,
        // Note: The SDK handles polling automatically
      };
      
      const transcript = await client.transcripts.transcribe(params);
      
      if (transcript.status === 'completed') {
        return transcript;
      } else if (transcript.status === 'error') {
        throw new Error(`Transcription error: ${transcript.error}`);
      }
      
    } catch (error) {
      attempt++;
      console.warn(`Attempt ${attempt} failed:`, error.message);
      
      if (attempt >= maxRetries) {
        throw new Error(`Failed after ${maxRetries} attempts:
${error.message}`);
      }
      
      // Wait before retry
      await new Promise(resolve => setTimeout(resolve, 1000 * attempt));
    }
  }
}

‍

Conclusion

Speaker diarization in JavaScript becomes straightforward with AssemblyAI's SDK. Configure your environment, set speaker_labels: true in your transcription parameters, and process the utterances data.

AssemblyAI's SDK handles complex audio processing and AI model inference, letting you focus on integrating speaker identification into your application's workflow. Whether you're building meeting transcription tools, podcast analysis platforms, or customer service automation, speaker diarization provides the foundation for understanding multi-speaker conversations.

Speaker diarization with AssemblyAI's JavaScript SDK enables powerful multi-speaker audio analysis for meeting transcriptions, podcast platforms, and customer service automation. The straightforward API integration lets you focus on your application's unique features while leveraging enterprise-grade Speech AI capabilities.

For more advanced implementations, explore AssemblyAI's speaker diarization documentation and consider combining speaker identification with other audio intelligence features like sentiment analysis and entity detection.

Next steps:

Experiment with different speakers_expected values for your use cases
Integrate speaker diarization with your existing transcription workflows
Check out our guide on speaker diarization vs speaker recognition to understand the differences

Related resources:

Build with AssemblyAI's Speaker Diarization

Start implementing enterprise-grade speaker diarization in your JavaScript applications today. No credit card required.

Signup Now

How to perform speaker diarization in JavaScript

Step-by-step guide to perform speaker diarization in JavaScript

Prerequisites and environment setup

Install the AssemblyAI JavaScript SDK

Set up your API credentials

Basic speaker diarization implementation

Advanced configuration options

Configuration Options

Handle local audio files

Error handling and best practices

Conclusion

How does context (like names spoken) influence automatic speaker labeling?

Using multichannel and speaker diarization

How to use Google's Speech-to-Text API to transcribe audio in Python

What is speaker diarization and how does it work? (Complete 2026 Guide)

Improved Hold Music Detection + Build LLM Audio Apps with LeMUR

Machine Learning Podcasts - The Ultimate Listening Guide

Build standout call coaching features with AI Summarization

How to integrate spoken audio into LlamaIndex.TS using AssemblyAI

How to perform speaker diarization in JavaScript

Step-by-step guide to perform speaker diarization in JavaScript

Prerequisites and environment setup

Install the AssemblyAI JavaScript SDK

Set up your API credentials

Basic speaker diarization implementation

Advanced configuration options

Configuration Options

Handle local audio files

Error handling and best practices

Conclusion

Related posts

How does context (like names spoken) influence automatic speaker labeling?

Using multichannel and speaker diarization

How to use Google's Speech-to-Text API to transcribe audio in Python

What is speaker diarization and how does it work? (Complete 2026 Guide)

Improved Hold Music Detection + Build LLM Audio Apps with LeMUR

Machine Learning Podcasts - The Ultimate Listening Guide

Build standout call coaching features with AI Summarization

How to integrate spoken audio into LlamaIndex.TS using AssemblyAI