customers
All customer stories
Top Voice AI companies are building with Assembly.
resources
Latest Release
Voice Agent API
Voice agents that get it right, respond instantly, and ship the same day with our new Voice Agent API
resources
Ultra-fast and ultra-accurate streaming STT built for voice agents. Get 300ms immutable transcripts and intelligent endpointing so your agents feel more natural and finish tasks successfully.
Two Solutions
Different architectures, different tradeoffs. Both powered by industry-leading speech models.
Our proprietary voice stack via one WebSocket. Connect, stream audio in, get audio back — we handle the rest.
Best for
Free tier available · No credit card required
The STT layer for your cascading voice agent architecture. Works natively with your preferred orchestrator.
Best for
No concurrency caps · Autoscaling included
Voice Agent API Demo
Speak into your browser and watch your words appear in real time.
Compare
Not sure which to pick? Use this to decide.
Features
Voice Agent API
AssemblyAI's proprietary voice stack
Universal-3 Pro Streaming STT API
Best-in-class STT for your stack
Industry-leading speech models
Unlimited concurrency
Enterprise grade reliability
Session-based pricing
Setup time
Working agent in an afternoon
Minutes to swap STT in an existing stack
Architecture
1 WebSocket · JSON messages · No frameworks required
Cascading (STT → LLM → TTS) — you own the full pipeline
LLM
Managed — update system prompt mid-conversation
Bring your own
Voice (TTS)
Included — select from natural-sounding voices
Bring your own
Pricing
$4.50/hr all-in — no token math across three invoices
$0.45/hr — STT only, unlimited concurrent streams
Integrations
LiveKit, Pipecat, any WebSocket client, Claude Code
LiveKit, Pipecat, custom WebSocket, Twilio SIP
Session resume
30-second reconnect window, context preserved
Via your orchestrator
Pre-built integrations with step‑by‑step docs enabling quick implementation without disrupting existing workflows.
“The speed difference is immediately noticeable — our users see their conversations transcribed almost instantaneously. It feels so much more responsive than what we were using before.”
Jonathan Kim, Software Engineer
Stream audio in, get audio back. We handle the rest with our proprietary voice stack, so you can focus on your product.
Learn More →Universal-3 Pro Streaming gives your voice agents the accuracy, speed, and real-time control to handle real conversations at scale.
Learn More →Explore our comprehensive docs with integration guides and best practices to optimize accuracy and latency for your application.
Learn More →