Summarize audio with LLMs in Node.js
Learn how to automatically summarize audio with Node.js and the AssemblyAI API.



To summarize audio files programmatically, you need two steps: convert speech to text, then pass that text to an LLM for summarization. This tutorial walks you through both using AssemblyAI's Node.js SDK for transcription and the LLM Gateway for LLM-based summarization.
You'll transcribe an audio file with AssemblyAI's Universal-3 Pro speech-to-text model, then send the transcript to an LLM through the LLM Gateway—which gives you access to 25+ models from Claude, GPT, Gemini, and more through a single endpoint. By the end, you'll have a working Node.js script that can transcribe and summarize any audio file in just a few lines of code.
No separate LLM provider accounts needed. One API key handles both transcription and summarization.
Summarization vs. transcription: Transcription converts spoken audio into a written, word-for-word text. Summarization takes that text and condenses it into the key points. You need transcription before you can summarize — and the accuracy of the transcript directly determines the quality of the summary.
Migrating from the legacy summarization parameter? The older summarization, summary_model, and summary_type parameters on the /v2/transcript endpoint are deprecated. LLM Gateway replaces them with a more flexible, prompt-driven workflow that supports 25+ models.
What is audio summarization?
Audio summarization is the process of extracting a concise, readable summary from spoken content in an audio or video file. Because LLMs can't process audio directly, this works as a two-step pipeline: first, a speech-to-text model transcribes the audio into text, then an LLM generates a summary from that transcript.
The speech-to-text step is critical—if the transcript is inaccurate, the summary will be too. A high-accuracy transcription model like AssemblyAI's Universal-3 Pro captures speaker nuances, technical terminology, and proper nouns that generic models miss. That accuracy carries through to the summarization step.
LLMs are particularly effective for the summarization step because they understand context, not just keywords. They can produce different output formats—bullet points, paragraph summaries, action items, chapter breakdowns—based on a simple prompt. This two-step approach works across meetings, podcasts, interviews, customer calls, and any other audio content.
How accurate is automated audio summarization?
Summarization quality is the product of two things: the transcript's word error rate (WER) and the LLM's ability to compress meaning. Researchers typically evaluate summarization quality using ROUGE scores against human-written reference summaries, but in practice the dominant driver of real-world quality is transcription accuracy — a transcript that misnames a speaker, drops a technical term, or garbles a number produces a measurably worse summary regardless of which LLM runs over it. Universal-3 Pro is built to minimize that upstream error so the downstream summary holds up.
Set up your environment
Before you start, ensure you have:
Create a new project folder and initialize it:
mkdir audio-summarization
cd audio-summarization
npm init -yOpen the package.json file and add type: "module" to the list of properties:
{
...
"type": "module",
...
}This will tell Node.js to use the ES Module syntax for exporting and importing modules, and not to use the old CommonJS syntax.
Then, install the AssemblyAI JavaScript SDK which lets you interact with the AssemblyAI API more easily:
npm install --save assemblyaiIf you don't have an AssemblyAI account yet, you must first sign up. Once you've copied your API key, configure it as the ASSEMBLYAI_API_KEY environment variable on your machine:
# Mac/Linux:
export ASSEMBLYAI_API_KEY=<YOUR_KEY>
# Windows:
set ASSEMBLYAI_API_KEY=<YOUR_KEY>Transcribe and summarize audio with LLMs
To summarize audio with LLMs in Node.js, you need the AssemblyAI SDK to transcribe audio files and the LLM Gateway to summarize them. This takes just a few lines of code.
Transcribe the audio with Universal-3 Pro
First, you'll transcribe the audio source. This example uses a meeting recording, but any audio or video file will work. Create a file named summarize.js and add the following code:
import { AssemblyAI } from 'assemblyai';
const client = new AssemblyAI({ apiKey: process.env.ASSEMBLYAI_API_KEY });
const audioUrl = 'https://assembly.ai/sports_injuries.mp3';
const transcript = await client.transcripts.transcribe({
audio: audioUrl,
speech_models: ['universal-3-pro', 'universal-2']
});
if (transcript.status === 'error') {
throw new Error(transcript.error);
}This code imports the SDK, initializes the client, and submits the audio file for transcription with Universal-3 Pro (with Universal-2 as a fallback for any languages or features not yet on Universal-3 Pro). The SDK handles polling internally and resolves once the transcript is complete.
Summarize the transcript with LLM Gateway
Once you have the transcript, send its id to the LLM Gateway and reference the transcript text in your prompt with the {{ transcript }} tag. The Gateway substitutes that tag with the transcript text server-side, so you don't have to ship the full transcript body in your request payload.
Add the following code to your summarize.js file:
const llmResponse = await fetch('https://llm-gateway.assemblyai.com/v1/chat/completions', {
method: 'POST',
headers: {
authorization: process.env.ASSEMBLYAI_API_KEY,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'claude-sonnet-4-6',
messages: [
{ role: 'user', content: 'Provide a concise, one-paragraph summary of this recording.\n\n{{ transcript }}' }
],
transcript_id: transcript.id,
max_tokens: 1000,
})
});
const result = await llmResponse.json();
console.log(result.choices[0].message.content);A few things to note about the request:
- transcript_id is a top-level field. The Gateway looks up the transcript and replaces the first occurrence of {{ transcript }} (exact tag, including the spaces — {{transcript}} won't be substituted) with the transcript's text field.
- Only the transcript's text field is substituted — utterances and speaker labels are not included. If you need speaker context, format the utterances yourself and include them directly in your prompt.
- claude-sonnet-4-6 is AssemblyAI's documented default for summarization, but any of the 25+ LLM Gateway models will work — swap the model parameter to use a different one.
Run the script
Execute the script from your terminal:
node summarize.jsAfter a few moments, the generated summary will be printed to your console.
Handle different audio formats and sources
The SDK handles multiple input types automatically: local file paths, public URLs, and streams and buffers.
For local files, replace the URL with a file path:
import fs from 'fs';
// ... client setup
const path = './path/to/your/audio.mp3';
const transcript = await client.transcripts.transcribe({
audio: path,
speech_models: ['universal-3-pro', 'universal-2']
});
// ... same summarization logic with transcript_id and {{ transcript }}The SDK manages the file upload and processing automatically. This works for all supported audio and video formats.
Process audio from streams or buffers:
// From a buffer
const audioBuffer = await fs.promises.readFile('./audio.mp3');
const transcript = await client.transcripts.transcribe({
audio: audioBuffer,
speech_models: ['universal-3-pro', 'universal-2']
});
// From a stream
const audioStream = fs.createReadStream('./audio.mp3');
const transcript = await client.transcripts.transcribe({
audio: audioStream,
speech_models: ['universal-3-pro', 'universal-2']
});Summarize a meeting recording
Meetings are one of the most common audio-summarization use cases — you typically want action items, decisions, and a recap rather than a generic blurb. The pattern is identical to a podcast summary: transcribe the meeting audio with Universal-3 Pro, then change the prompt to ask for meeting-shaped output. Adding a one-line context block at the top of the prompt helps the model focus on the right things.
const meetingContext = 'This was a weekly sync for the Project Phoenix development team.';
const llmResponse = await fetch('https://llm-gateway.assemblyai.com/v1/chat/completions', {
method: 'POST',
headers: {
authorization: process.env.ASSEMBLYAI_API_KEY,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'claude-sonnet-4-6',
messages: [
{ role: 'user', content: `Context: ${meetingContext}\n\nFrom the following meeting transcript, extract:\n1. Key decisions made\n2. Action items with owners (when mentioned)\n3. Open questions or blockers\n\n{{ transcript }}` }
],
transcript_id: transcript.id,
max_tokens: 1000,
})
});
const result = await llmResponse.json();
console.log(result.choices[0].message.content);The same approach extends to interviews, customer calls, sales discovery, podcasts, and lectures — change the context and the structure you ask for.
Customize summary type
The same transcript can produce very different summaries depending on what you ask for. Pick a summary type based on your use case:
For example, to get a bulleted list of key topics instead of a paragraph:
const llmResponse = await fetch('https://llm-gateway.assemblyai.com/v1/chat/completions', {
method: 'POST',
headers: {
authorization: process.env.ASSEMBLYAI_API_KEY,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'claude-sonnet-4-6',
messages: [
{ role: 'user', content: 'Generate a bulleted list of the main topics discussed. Each bullet should be a short, descriptive sentence.\n\n{{ transcript }}' }
],
transcript_id: transcript.id,
max_tokens: 1000,
})
});To request structured JSON output suitable for a downstream data pipeline:
const llmResponse = await fetch('https://llm-gateway.assemblyai.com/v1/chat/completions', {
method: 'POST',
headers: {
authorization: process.env.ASSEMBLYAI_API_KEY,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'claude-sonnet-4-6',
messages: [
{ role: 'user', content: 'Summarize the recording using this JSON shape:\n{ "topic": string, "key_points": string[], "action_items": string[], "conclusion": string }\n\nReturn only valid JSON.\n\n{{ transcript }}' }
],
transcript_id: transcript.id,
max_tokens: 1000,
})
});For stricter guarantees, the LLM Gateway also supports Structured Outputs that constrain the response to a specific JSON schema. Experimenting with different prompts and structures is the best way to get the exact output your application requires.
Error handling and performance optimization
Production applications require comprehensive error handling. Wrap API calls in try...catch blocks, log the LLM Gateway request_id for support, and add fallback models so a single provider outage doesn't break your pipeline.
Log the request_id for every LLM call
Every LLM Gateway response includes a request_id field — a unique identifier for that request. Log it (along with the model, region, and a timestamp) for every call you make, not just failures. If you ever contact support@assemblyai.com about a specific request, this ID lets the team locate it in the logs immediately.
try {
const transcript = await client.transcripts.transcribe({
audio: audioUrl,
speech_models: ['universal-3-pro', 'universal-2']
});
if (transcript.status === 'error') {
throw new Error(`Transcription failed: ${transcript.error}`);
}
const llmResponse = await fetch('https://llm-gateway.assemblyai.com/v1/chat/completions', {
method: 'POST',
headers: {
authorization: process.env.ASSEMBLYAI_API_KEY,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'claude-sonnet-4-6',
messages: [
{ role: 'user', content: 'Provide a concise summary.\n\n{{ transcript }}' }
],
transcript_id: transcript.id,
max_tokens: 1000,
})
});
if (!llmResponse.ok) {
const errorData = await llmResponse.json();
throw new Error(`LLM Gateway request failed: ${JSON.stringify(errorData)}`);
}
const result = await llmResponse.json();
console.log('request_id:', result.request_id, 'model:', result.model);
console.log(result.choices[0].message.content);
} catch (error) {
console.error('An error occurred:', error.message);
// Handle error appropriately — retry, log, notify, etc.
}Add fallback models for resilience
The LLM Gateway supports automatic fallback models — if your primary model fails, the Gateway transparently retries with the backup. Set a fast, cheap fallback so a single provider outage doesn't break your summarization pipeline:
body: JSON.stringify({
model: 'claude-sonnet-4-6',
messages: [
{ role: 'user', content: 'Provide a concise summary.\n\n{{ transcript }}' }
],
transcript_id: transcript.id,
max_tokens: 1000,
fallbacks: [
{ model: 'claude-haiku-4-5-20251001' }
],
})When a fallback is used, the response's model field reflects which model actually served the request, and you're billed only for that model. You can chain up to two fallbacks via fallback_config.depth.
Use the EU endpoint for data residency
If your audio or summaries need to stay within the EU, use AssemblyAI's EU region for transcription and swap the LLM Gateway endpoint for its EU equivalent: https://llm-gateway.eu.assemblyai.com/v1/chat/completions. Anthropic Claude and most Google Gemini models are supported in the EU.
Process multiple files in parallel
For batches, submit transcription jobs in parallel with Promise.all() and summarize each transcript independently:
const audioFiles = [
'https://storage.googleapis.com/aai-web-samples/espn.m4a',
'https://assembly.ai/sports_injuries.mp3'
];
// Submit all files for transcription in parallel
const transcriptPromises = audioFiles.map(audio =>
client.transcripts.transcribe({
audio,
speech_models: ['universal-3-pro', 'universal-2']
})
);
const transcripts = await Promise.all(transcriptPromises);
// Summarize each successful transcript independently using its transcript_id
const completed = transcripts.filter(t => t.status === 'completed');
const summaries = await Promise.all(completed.map(async (t) => {
const res = await fetch('https://llm-gateway.assemblyai.com/v1/chat/completions', {
method: 'POST',
headers: {
authorization: process.env.ASSEMBLYAI_API_KEY,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'claude-sonnet-4-6',
messages: [
{ role: 'user', content: 'Provide a concise summary.\n\n{{ transcript }}' }
],
transcript_id: t.id,
max_tokens: 1000,
})
});
return res.json();
}));
summaries.forEach(s => console.log(s.choices[0].message.content));The transcribe method handles polling internally and only resolves once the transcription is complete. For long-running jobs where you want non-blocking submission, use the asynchronous submit method and poll for the result manually with exponential backoff.
Frequently asked questions about audio summarization with LLMs
How do I summarize a local audio file instead of a URL?
The AssemblyAI Node.js SDK handles local files automatically. Instead of providing a URL to the audio parameter, provide the local file path as a string, like audio: './my-audio-file.mp3'. The SDK will manage the upload and transcription process.
Can I change the format of the summary?
Yes. Control the output format by changing the prompt you send to the LLM Gateway — ask for a bulleted list, a JSON object, a headline, or a custom structure. For stricter guarantees, you can constrain output to a JSON schema with the LLM Gateway's Structured Outputs feature.
Which LLM should I use for audio summarization?
It depends on your latency and quality requirements. claude-sonnet-4-6 is AssemblyAI's documented default for summarization and offers strong reasoning across long transcripts. For lower latency and cost — typical for high-volume summarization — try claude-haiku-4-5-20251001, gemini-2.5-flash-lite, gpt-5-mini, or kimi-k2.5. The full model list with quality and latency benchmarks is in the LLM Gateway Overview.
What's the best way to handle errors during transcription or summarization?
Always check the status of the transcript object. If it's 'error', the error property will contain a descriptive message. Wrap your API calls in a try...catch block, log the LLM Gateway request_id from every response, and add fallback models so a single provider outage doesn't break your pipeline.
How can I optimize costs when processing large audio files?
Use a smaller, faster LLM Gateway model (Kimi K2.5, Gemini 2.5 Flash-Lite, GPT-5 mini, or Claude Haiku 4.5) when you don't need the highest-quality summary. Pass transcript_id rather than reshipping the full transcript text in your prompt — the Gateway substitutes the text server-side, which reduces request payload size and lets you cache transcripts across multiple summarization prompts.
Can I process multiple audio files simultaneously?
Yes. Use Promise.all() to submit multiple transcription jobs in parallel, then summarize each transcript independently using its transcript_id. This significantly reduces wall-clock processing time on batches.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

