Power best-in-class voice agents

Ultra-fast and ultra-accurate streaming STT built for voice agents. Get 300ms immutable transcripts and intelligent endpointing so your agents feel more natural and finish tasks successfully.

Delphi
Happy Scribe
Glean
Granola
Supernormal
Runway
Ashby
Jiminny
JotPsych
Earmark
EdgeTier
Genio
Grain
Loop
Calabrio
Veed.io
Dovetail
WhatConverts
CallRail
Delphi
Happy Scribe
Glean
Granola
Supernormal
Runway
Ashby
Jiminny
JotPsych
Earmark
EdgeTier
Genio
Grain
Loop
Calabrio
Veed.io
Dovetail
WhatConverts
CallRail
Delphi
Happy Scribe
Glean
Granola
Supernormal
Runway
Ashby
Jiminny
JotPsych
Earmark
EdgeTier
Genio
Grain
Loop
Calabrio
Veed.io
Dovetail
WhatConverts
CallRail
Delphi
Happy Scribe
Glean
Granola
Supernormal
Runway
Ashby
Jiminny
JotPsych
Earmark
EdgeTier
Genio
Grain
Loop
Calabrio
Veed.io
Dovetail
WhatConverts
CallRail

Two Solutions

Pick the API that fits your build

Different architectures, different tradeoffs. Both powered by industry-leading speech models.

Recommended

Voice Agent API

Our proprietary voice stack via one WebSocket. Connect, stream audio in, get audio back — we handle the rest.

Best for

  • Best-in-class voice agents — the preferred way to build with AssemblyAI
  • Customer support agents, AI companions, clinical intake, language learning
  • Teams shipping fast — working agent in an afternoon, no infra to manage
  • Claude Code compatible — paste the docs and build anything
$4.50/hr — speech, LLM, and voice all included
Get started for free

Free tier available · No credit card required

Bring Your Own Stack

Universal-3 Pro Streaming STT API

The STT layer for your cascading voice agent architecture. Works natively with your preferred orchestrator.

Best for

  • Teams already using LiveKit, Pipecat, or Vapi as their orchestration layer
  • Teams running cascading architectures (STT → LLM → TTS)
  • High-scale deployments where margin and full control matter
  • Complex workflows with RAG, custom tooling, or proprietary LLMs
  • HIPAA, SOC 2 — bring your own compliance infrastructure
$0.45/hr — transcription only, unlimited concurrent streams
View integration docs

No concurrency caps · Autoscaling included

Voice Agent API Demo

Try it for yourself

Speak into your browser and watch your words appear in real time.

Try the Voice Agent API live. This support agent is built on the Voice Agent API — the same one you can ship with. Click to start talking and experience real-time Voice AI in action. Ask about our products, APIs, or docs.

Please note: This agent provides customer support for AssemblyAI products only. Do not share sensitive or non-public information.

AssemblyAI Support Agent

Compare

Choose based on your architecture

Not sure which to pick? Use this to decide.

Features

Voice Agent API

AssemblyAI's proprietary voice stack

Universal-3 Pro Streaming STT API

Best-in-class STT for your stack

Industry-leading speech models

Unlimited concurrency

Enterprise grade reliability

Session-based pricing

Setup time

Working agent in an afternoon

Minutes to swap STT in an existing stack

Architecture

1 WebSocket · JSON messages · No frameworks required

Cascading (STT → LLM → TTS) — you own the full pipeline

LLM

Managed — update system prompt mid-conversation

Bring your own

Voice (TTS)

Included — select from natural-sounding voices

Bring your own

Pricing

$4.50/hr all-in — no token math across three invoices

$0.45/hr — STT only, unlimited concurrent streams

Integrations

LiveKit, Pipecat, any WebSocket client, Claude Code

LiveKit, Pipecat, custom WebSocket, Twilio SIP

Session resume

30-second reconnect window, context preserved

Via your orchestrator

Ready to plug into your voice‑agent stack

Pre-built integrations with step‑by‑step docs enabling quick implementation without disrupting existing workflows.

“The speed difference is immediately noticeable — our users see their conversations transcribed almost instantaneously. It feels so much more responsive than what we were using before.”

Jonathan Kim, Software Engineer

Building a Voice Agent?

Voice Agent API

Stream audio in, get audio back. We handle the rest with our proprietary voice stack, so you can focus on your product.

Learn More →

Universal-3 Pro Streaming

Universal-3 Pro Streaming gives your voice agents the accuracy, speed, and real-time control to handle real conversations at scale.

Learn More →

Start Building

Explore our comprehensive docs with integration guides and best practices to optimize accuracy and latency for your application.

Learn More →

Common questions