Build & Learn
July 29, 2025

Transcribe audio and video files with Python and Universal

Learn how to transcribe audio and video files in your Python applications with AssemblyAI's Universal speech recognition model.

Patrick Loeber
Senior Developer Advocate
Patrick Loeber
Senior Developer Advocate
Reviewed by
No items found.
No items found.
No items found.
No items found.
Table of contents

Introduction

Universal, our latest speech model, delivers industry-leading automated speech recognition (ASR) accuracy. Universal handles accented speech, background noise, and challenging phrases like flight numbers and email addresses with exceptional accuracy. The model is accessible through the same web API as our previous ASR models.

Universal's accuracy improvements enable new applications and products built with voice data. This model provides reliable speech-to-text technology with high performance.

You'll learn how to transcribe audio and video files in Python using Universal through our Speech-to-Text API.

Getting Started

Prerequisites

The easiest way to start transcribing audio is by using one of our official SDKs. Install the AssemblyAI Python SDK with the following command:

pip install assemblyai

API Key Setup

Sign up for a new account or log into your existing AssemblyAI account to obtain the API key from your account dashboard, as we will need this API key to authorize our API calls in a Python script.

Basic Transcription with Universal

To start transcribing an audio file from a URL using Universal, create a new file named transcribe.py and import the SDK in your Python code:

import assemblyai as aai

Configure a new authenticated SDK client with the API key found in your account dashboard:

aai.settings.api_key = "YOUR_API_KEY"
transcriber = aai.Transcriber()

By default, all transcriptions use Universal, so you'll always get the highest accuracy without any extra configuration.

# You can use an audio file located at a publicly-accessible URL
audio_file =
"https://storage.googleapis.com/aai-web-samples/5_common_sports_i
njuries.mp3"

# This code will run Universal
transcript = transcriber.transcribe(audio_file)

if transcript.error:
    print(transcript.error)
else:
    print("Universal output:")
    print(transcript.text)

Complete Example

Here is what the completed script looks like:

import assemblyai as aai

aai.settings.api_key = "YOUR_API_KEY"
transcriber = aai.Transcriber()

# You can use an audio file located at a publicly-accessible URL
audio_file = "https://storage.googleapis.com/aai-web-samples/5_common_sports_i
njuries.mp3"

# This code will run Universal
transcript = transcriber.transcribe(audio_file)

if transcript.error:
    print(transcript.error)
else:
    print("Universal output:")
    print(transcript.text)

When you run the above script, you should see the transcription results from Universal printed to your terminal.

Performance and Capabilities

Universal provides enhanced speech-to-text accuracy with several key improvements:

  • 24% improvement in proper noun recognition - Superior handling of names, brands, locations, and domain-specific terminology compared to Universal-1
  • 15% improvement in transcript structure - Enhanced punctuation, capitalization, and formatting for production-ready outputs
  • 3% lower Word Error Rate - Best-in-class speech recognition performance across diverse audio conditions
  • 73% human preference rate - Preferred by users over previous generation models in blind testing
  • Precise timestamp accuracy - More accurate word-level timing information for downstream applications
  • Reduced hallucinations - Significantly fewer false transcriptions during silence or background noise

Multi-language support - Robust performance across English, Spanish, French, German, and other supported languages

Pricing

Current pricing for Universal:

  • Universal: $0.27 per hour of audio

Beyond Basic Transcription

You should now have the results of the transcription performed by Universal printed to your terminal. Use this code to transcribe audio and video files in your Python applications with high accuracy.

You've now used Universal to transcribe audio with advanced Speech AI capabilities. Next, there are many further features that AssemblyAI offers beyond transcription to explore, such as:

  • Entity detection to automatically identify and categorize key information
  • Content moderation for detecting inappropriate content in audio files to ensure that your content is safe for all audiences
  • PII redaction to minimize sensitive information about individuals by automatically identifying and removing it from your transcript
  • LeMUR for applying Large Language Models (LLMs) to audio data in a single line of code

You can also learn more about our approach to creating advanced Speech AI models on our Research page.

Additional Resources

Also, for more information on Python and AssemblyAI, take a look at some of our other resources:

Use Cases and Applications

Universal's high accuracy and performance make it ideal for developers building:

  • AI Meeting Intelligence - Capture comprehensive meeting documentation with accurate speaker identification and action item extraction
  • Conversation Intelligence Platforms - Analyze customer interactions and sales calls with reliable transcription for downstream insights
  • Call Tracking and Analytics - Monitor marketing campaign effectiveness and lead quality with precise call transcription
  • Voice-Enabled Applications - Build robust voice interfaces with confidence in transcription accuracy
  • Content Creation and Media - Transform audio and video content into searchable, structured text for content management systems
Get Your Free Speech-to-Text API Key

Start transcribing audio and video with $50 in free credits to explore our enterprise-grade Speech AI API.

Start now
Title goes here

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Button Text
Automatic Speech Recognition
Python
Universal-1