July 29, 2025

Transcribe audio and video files with Python and Universal

Learn how to transcribe audio and video files in your Python applications with AssemblyAI's Universal speech recognition model.

Automatic Speech Recognition

Python

Universal-1

Patrick Loeber

Senior Developer Advocate

Patrick Loeber

Senior Developer Advocate

Reviewed by

No items found.

Table of contents

[Visible on live site]

Introduction

Universal, our latest speech model, delivers industry-leading automated speech recognition (ASR) accuracy. Universal handles accented speech, background noise, and challenging phrases like flight numbers and email addresses with exceptional accuracy. The model is accessible through the same web API as our previous ASR models.

Universal's accuracy improvements enable new applications and products built with voice data. This model provides reliable speech-to-text technology with high performance.

You'll learn how to transcribe audio and video files in Python using Universal through our Speech-to-Text API.

Getting Started

Prerequisites

The easiest way to start transcribing audio is by using one of our official SDKs. Install the AssemblyAI Python SDK with the following command:

pip install assemblyai

‍

API Key Setup

Sign up for a new account or log into your existing AssemblyAI account to obtain the API key from your account dashboard, as we will need this API key to authorize our API calls in a Python script.

Basic Transcription with Universal

To start transcribing an audio file from a URL using Universal, create a new file named transcribe.py and import the SDK in your Python code:

import assemblyai as aai

‍

Configure a new authenticated SDK client with the API key found in your account dashboard:

aai.settings.api_key = "YOUR_API_KEY"
transcriber = aai.Transcriber()

‍

By default, all transcriptions use Universal, so you'll always get the highest accuracy without any extra configuration.

# You can use an audio file located at a publicly-accessible URL
audio_file =
"https://storage.googleapis.com/aai-web-samples/5_common_sports_i
njuries.mp3"

# This code will run Universal
transcript = transcriber.transcribe(audio_file)

if transcript.error:
    print(transcript.error)
else:
    print("Universal output:")
    print(transcript.text)

‍

Complete Example

Here is what the completed script looks like:

import assemblyai as aai

aai.settings.api_key = "YOUR_API_KEY"
transcriber = aai.Transcriber()

# You can use an audio file located at a publicly-accessible URL
audio_file = "https://storage.googleapis.com/aai-web-samples/5_common_sports_i
njuries.mp3"

# This code will run Universal
transcript = transcriber.transcribe(audio_file)

if transcript.error:
    print(transcript.error)
else:
    print("Universal output:")
    print(transcript.text)

‍

When you run the above script, you should see the transcription results from Universal printed to your terminal.

Performance and Capabilities

Universal provides enhanced speech-to-text accuracy with several key improvements:

24% improvement in proper noun recognition - Superior handling of names, brands, locations, and domain-specific terminology compared to Universal-1
15% improvement in transcript structure - Enhanced punctuation, capitalization, and formatting for production-ready outputs
3% lower Word Error Rate - Best-in-class speech recognition performance across diverse audio conditions
73% human preference rate - Preferred by users over previous generation models in blind testing
Precise timestamp accuracy - More accurate word-level timing information for downstream applications
Reduced hallucinations - Significantly fewer false transcriptions during silence or background noise‍
Multi-language support - Robust performance across English, Spanish, French, German, and other supported languages

Pricing

Current pricing for Universal:

Universal: $0.15 per hour of audio

Beyond Basic Transcription

You should now have the results of the transcription performed by Universal printed to your terminal. Use this code to transcribe audio and video files in your Python applications with high accuracy.

You've now used Universal to transcribe audio with advanced Speech AI capabilities. Next, there are many further features that AssemblyAI offers beyond transcription to explore, such as:

Entity detection to automatically identify and categorize key information
Content moderation for detecting inappropriate content in audio files to ensure that your content is safe for all audiences
PII redaction to minimize sensitive information about individuals by automatically identifying and removing it from your transcript
LeMUR for applying Large Language Models (LLMs) to audio data in a single line of code

You can also learn more about our approach to creating advanced Speech AI models on our Research page.

Additional Resources

Also, for more information on Python and AssemblyAI, take a look at some of our other resources:

Use Cases and Applications

Universal's high accuracy and performance make it ideal for developers building:

AI Meeting Intelligence - Capture comprehensive meeting documentation with accurate speaker identification and action item extraction
Conversation Intelligence Platforms - Analyze customer interactions and sales calls with reliable transcription for downstream insights
Call Tracking and Analytics - Monitor marketing campaign effectiveness and lead quality with precise call transcription
Voice-Enabled Applications - Build robust voice interfaces with confidence in transcription accuracy‍
Content Creation and Media - Transform audio and video content into searchable, structured text for content management systems

Get Your Free Speech-to-Text API Key

Start transcribing audio and video with $50 in free credits to explore our enterprise-grade Speech AI API.

Start now

Transcribe audio and video files with Python and Universal

Introduction

Getting Started

Prerequisites

API Key Setup

Basic Transcription with Universal

Complete Example

Performance and Capabilities

Pricing

Beyond Basic Transcription

Additional Resources

Use Cases and Applications

How to use Google's Speech-to-Text API to transcribe audio in Python

Speech-to-text API accuracy for phone call transcription

The best audio file formats for speech-to-text: A guide

Python Speech-to-Text with Punctuation, Casing, and Formatting

AI call centers: How AI voice agents are transforming contact centers

Golden Gemini: A new approach in Speech AI

Medical voice recognition: How AI solves terminology problems

8 best revenue intelligence platforms using AI in 2025

Transcribe audio and video files with Python and Universal

Introduction

Getting Started

Prerequisites

API Key Setup

Basic Transcription with Universal

Complete Example

Performance and Capabilities

Pricing

Beyond Basic Transcription

Additional Resources

Use Cases and Applications

Related posts

How to use Google's Speech-to-Text API to transcribe audio in Python

Speech-to-text API accuracy for phone call transcription

The best audio file formats for speech-to-text: A guide

Python Speech-to-Text with Punctuation, Casing, and Formatting

AI call centers: How AI voice agents are transforming contact centers

Golden Gemini: A new approach in Speech AI

Medical voice recognition: How AI solves terminology problems

8 best revenue intelligence platforms using AI in 2025