Self-hosted deployment

Voice AI models hosted in your environment

Run AssemblyAI's Voice AI models on your own infrastructure to tighten latency, meet compliance requirements, and keep full control of your stack.

Runway
Dovetail
Granola
Supernormal
Ashby
Jiminny
Calabrio
JotPsych
EdgeTier
Genio
WhatConverts
Earmark
Grain
Loop
CallRail
Happy Scribe
Veed.io
Delphi
Runway
Dovetail
Granola
Supernormal
Ashby
Jiminny
Calabrio
JotPsych
EdgeTier
Genio
WhatConverts
Earmark
Grain
Loop
CallRail
Happy Scribe
Veed.io
Delphi
Runway
Dovetail
Granola
Supernormal
Ashby
Jiminny
Calabrio
JotPsych
EdgeTier
Genio
WhatConverts
Earmark
Grain
Loop
CallRail
Happy Scribe
Veed.io
Delphi
Runway
Dovetail
Granola
Supernormal
Ashby
Jiminny
Calabrio
JotPsych
EdgeTier
Genio
WhatConverts
Earmark
Grain
Loop
CallRail
Happy Scribe
Veed.io
Delphi

Our Voice AI models, running on your infrastructure

Self-host our speech-to-text models with the same accuracy and price-performance you get from AssemblyAI's cloud API.

Optimized latency

Co-locate your Voice AI stack with the rest of your infrastructure so audio is processed close to where your traffic originates.

Complete data sovereignty

Keep every second of audio inside your environment, even while you're serving customers globally.

Infrastructure control

Tune scaling to match your exact traffic patterns. We provide the metrics and observability your autoscaling needs.

Universal deployment

Run on any container orchestration platform — Kubernetes, AWS ECS, or whatever your team already uses.

Cloud integration

Apply AssemblyAI usage to your cloud provider's committed spend so you get the discounts you've already negotiated.

Regulatory compliance

Meet strict regulatory and data residency requirements by processing audio inside your controlled perimeter.

The same Universal models

Run our Universal-3 Pro Streaming model with the same accuracy and speed you get from our cloud API.

Session-based pricing

Same usage-based pricing as the cloud — no self-hosting premium. Daily billing options and volume discounts included.

GPU flexibility

Full GPU support for maximum performance, with options for regions with hardware import limitations.

Common questions