Real-time transcription in Python with Universal-Streaming
Learn how to build real-time voice applications with AssemblyAI's Universal-Streaming model.



Learn how to perform ultra-fast, ultra-accurate real-time transcription on audio streams using Python and AssemblyAI's Universal-Streaming model.
Real-time transcription allows you to transcribe audio as it is generated, rather than submitting a complete audio file for transcription as with asynchronous transcription. Using Universal-Streaming, you can build voice agents, automated subtitles for live speeches, real-time meeting transcription, and interactive voice applications with industry-leading accuracy and ~300ms latency.
In this tutorial, we will learn how to perform real-time transcription in Python using AssemblyAI's Universal-Streaming model.
Getting started
For this tutorial, we'll be using AssemblyAI's Universal-Streaming model, which delivers immutable transcripts with ~300ms latency and intelligent endpointing designed specifically for voice agents.
You'll need an API key, so get one for free here if you don't already have one.
Universal-Streaming is priced at $0.15/hour based on session duration with unlimited concurrency, making it cost-effective for applications ranging from single-user voice assistants to enterprise-scale voice agents.
Setting up the virtual environment
We'll use the AssemblyAI Python SDK for this tutorial, which provides high-level functions for interacting with Universal-Streaming. To install it, first create a directory and virtual environment for this project:
mkdir universal-streaming-demo && cd universal-streaming-demo
python -m venv venv
Next, activate the virtual environment.
On MacOS/Linux:
source ./venv/bin/activate
On Windows:
.\venv\Scripts\activate.bat
We'll need the system dependency portaudio before installing the necessary pip packages, so install it with apt install portaudio19-dev (Debian/Ubuntu) or brew install portaudio (MacOS). For other operating systems, see the portaudio website.
Now, install the SDK and the additional extras:
pip install "assemblyai[extras]"
The extras contain additional packages for real-time transcription functionality, like getting the audio stream from the microphone.
Setting up the environment file
The AssemblyAI Python SDK requires your API key to be stored in an environment variable called ASSEMBLYAI_API_KEY. Create a file called .env in your project directory and add your API key:
ASSEMBLYAI_API_KEY=your-key-here
Important: Never share this file or check it into source control. Create a .gitignore file to prevent accidental commits:
.env
venv
How to perform real-time transcription with Universal-Streaming
Universal-Streaming uses WebSocket connections to provide ultra-fast, immutable transcripts. Unlike traditional streaming models that provide partial and final transcripts, Universal-Streaming delivers immutable transcripts that won't change once emitted, making them immediately ready for downstream processing in voice agents.
Understanding Universal-Streaming responses
Universal-Streaming is built around Turn objects and immutable transcriptions. A Turn corresponds to a speaking turn in voice conversations and includes:
- turn_order: Integer that increments with each new turn
- transcript: String containing only finalized words
- end_of_turn: Boolean indicating if this is the end of the current turn
- turn_is_formatted: Boolean indicating if the text includes punctuation and formatting
- end_of_turn_confidence: Float (0-1) representing confidence that the turn has finished
Event handlers
We need to define event handlers for different types of events during the streaming session.
Create a file called main.py and add the following imports and event handlers:
import assemblyai as aai
from typing import Type
from dotenv import load_dotenv
import os
from assemblyai.streaming.v3 import (
BeginEvent,
StreamingClient,
StreamingClientOptions,
StreamingError,
StreamingEvents,
StreamingParameters,
StreamingSessionParameters,
TerminationEvent,
TurnEvent,
)
load_dotenv()
api_key = os.getenv('ASSEMBLYAI_API_KEY')
def on_begin(self: Type[StreamingClient], event: BeginEvent):
print(f"Session started: {event.id}")
def on_turn(self: Type[StreamingClient], event: TurnEvent):
print(f"{event.transcript} ({event.end_of_turn})")
if event.end_of_turn and not event.turn_is_formatted:
params = StreamingSessionParameters(
format_turns=True,
)
self.set_params(params)
def on_terminated(self: Type[StreamingClient], event:
TerminationEvent):
print(
f"Session terminated: {event.audio_duration_seconds}
seconds of audio processed"
)
def on_error(self: Type[StreamingClient], error: StreamingError):
print(f"Error occurred: {error}")
Create and run the streaming client
Now add the main script code to create and run the Universal-Streaming client:
def main():
client = StreamingClient(
StreamingClientOptions(
api_key=api_key,
api_host="streaming.assemblyai.com"
)
)
client.on(StreamingEvents.Begin, on_begin)
client.on(StreamingEvents.Turn, on_turn)
client.on(StreamingEvents.Termination, on_terminated)
client.on(StreamingEvents.Error, on_error)
client.connect(
StreamingParameters(
sample_rate=16000,
format_turns=True,
)
)
try:
client.stream(
aai.extras.MicrophoneStream(sample_rate=16000)
)
finally:
client.disconnect(terminate=True)
if __name__ == "__main__":
main()
Running the script
With your virtual environment activated, run the script:
python main.py
You'll see your session ID printed when the connection starts. As you speak, you'll see immutable transcripts appear in real-time. When you finish speaking, the transcript will include proper punctuation and formatting. Press Ctrl+C to stop the session.
Advanced configuration options
Universal-Streaming offers several configuration options to optimize for your specific use case:
Intelligent endpointing
Configure end-of-turn detection to handle natural conversation flows:
client.connect(
StreamingParameters(
sample_rate=16000,
end_of_turn_confidence_threshold=0.8,
min_end_of_turn_silence_when_confident=500, # milliseconds
max_turn_silence=2000, # milliseconds
)
)
Text formatting control
Control whether you receive formatted transcripts:
client.connect(
StreamingParameters(
sample_rate=16000,
format_turns=True
)
)
Authentication tokens
For client-side applications, use temporary authentication tokens to avoid exposing your API key. First, on the server-side, use your API key to generate the temporary token:
# Generate a temporary token (do this on your server)
client = StreamingClient(
StreamingClientOptions(
api_key=api_key,
api_host="streaming.assemblyai.com"
)
)
token = client.create_temporary_token(expires_in_seconds=60, max_session_duration_seconds=3600)
Then on the client-side, initialize the StreamingClient with the token parameter instead of the API key:
client = StreamingClient(
StreamingClientOptions(
token=token,
api_host="streaming.assemblyai.com"
)
Complete example
Here's the complete working example:
import assemblyai as aai
from typing import Type
from dotenv import load_dotenv
import os
from assemblyai.streaming.v3 import (
BeginEvent,
StreamingClient,
StreamingClientOptions,
StreamingError,
StreamingEvents,
StreamingParameters,
StreamingSessionParameters,
TerminationEvent,
TurnEvent,
)
load_dotenv()
api_key = os.getenv('ASSEMBLYAI_API_KEY')
def on_begin(self: Type[StreamingClient], event: BeginEvent):
print(f"Session started: {event.id}")
def on_turn(self: Type[StreamingClient], event: TurnEvent):
print(f"{event.transcript} ({event.end_of_turn})")
if event.end_of_turn and not event.turn_is_formatted:
params = StreamingSessionParameters(
format_turns=True,
)
self.set_params(params)
def on_terminated(self: Type[StreamingClient], event: TerminationEvent):
print(
f"Session terminated: {event.audio_duration_seconds}
seconds of audio processed"
)
def on_error(self: Type[StreamingClient], error: StreamingError):
print(f"Error occurred: {error}")
def main():
client = StreamingClient(
StreamingClientOptions(
api_key=api_key,
api_host="streaming.assemblyai.com"
)
)
client.on(StreamingEvents.Begin, on_begin)
client.on(StreamingEvents.Turn, on_turn)
client.on(StreamingEvents.Termination, on_terminated)
client.on(StreamingEvents.Error, on_error)
client.connect(
StreamingParameters(
sample_rate=16000,
format_turns=True,
)
)
try:
client.stream(
aai.extras.MicrophoneStream(sample_rate=16000)
)
finally:
client.disconnect(terminate=True)
if __name__ == "__main__":
main()
Best practices for Universal-Streaming
To get the best results from Universal-Streaming:
- Use appropriate sample rates: 16kHz or higher for better accuracy
- Keep connections open: Avoid frequent reconnections to minimize latency
- Optimize for your use case: Use unformatted transcripts for faster processing in voice agents
- Handle network issues: Implement proper error handling and reconnection logic
- Use authentication tokens: For client-side applications, generate temporary tokens server-side
Use cases for Universal-Streaming
Universal-Streaming is designed for applications that need ultra-fast, accurate speech recognition:
- Voice agents: Build conversational AI with natural turn-taking
- Live captioning: Provide real-time subtitles for meetings and events
- Voice assistants: Create responsive voice interfaces
- Call center analytics: Analyze customer conversations in real-time
- Meeting transcription: Document discussions as they happen
Conclusion
In this tutorial, we learned how to perform real-time transcription in Python using AssemblyAI's Universal-Streaming model. With ~300ms latency, immutable transcripts, and intelligent endpointing, Universal-Streaming provides the performance and accuracy needed for production voice applications.
The combination of superior accuracy, transparent pricing at $0.15/hour, and unlimited concurrency makes Universal-Streaming ideal for everything from prototype voice agents to enterprise-scale conversational AI systems.
For more information about using Universal-Streaming, please refer to our documentation and API Reference Guide.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.