Self-hosted deployment

Voice AI models hosted in your environment

Run AssemblyAI's Voice AI models on your own infrastructure to tighten latency, meet compliance requirements, and keep full control of your stack.

Delphi
Happy Scribe
Granola
Supernormal
Runway
Ashby
Jiminny
JotPsych
Earmark
EdgeTier
Genio
Grain
Loop
Calabrio
Veed.io
Dovetail
WhatConverts
CallRail
Delphi
Happy Scribe
Granola
Supernormal
Runway
Ashby
Jiminny
JotPsych
Earmark
EdgeTier
Genio
Grain
Loop
Calabrio
Veed.io
Dovetail
WhatConverts
CallRail
Delphi
Happy Scribe
Granola
Supernormal
Runway
Ashby
Jiminny
JotPsych
Earmark
EdgeTier
Genio
Grain
Loop
Calabrio
Veed.io
Dovetail
WhatConverts
CallRail
Delphi
Happy Scribe
Granola
Supernormal
Runway
Ashby
Jiminny
JotPsych
Earmark
EdgeTier
Genio
Grain
Loop
Calabrio
Veed.io
Dovetail
WhatConverts
CallRail

Our Voice AI models, running on your infrastructure

Self-host our speech-to-text models with the same accuracy and price-performance you get from AssemblyAI's cloud API.

Optimized latency

Co-locate your Voice AI stack with the rest of your infrastructure so audio is processed close to where your traffic originates.

Complete data sovereignty

Keep every second of audio inside your environment, even while you're serving customers globally.

Infrastructure control

Tune scaling to match your exact traffic patterns. We provide the metrics and observability your autoscaling needs.

Universal deployment

Run on any container orchestration platform — Kubernetes, AWS ECS, or whatever your team already uses.

Cloud integration

Apply AssemblyAI usage to your cloud provider's committed spend so you get the discounts you've already negotiated.

Regulatory compliance

Meet strict regulatory and data residency requirements by processing audio inside your controlled perimeter.

The same Universal models

Run our Universal-3 Pro Streaming model with the same accuracy and speed you get from our cloud API.

Session-based pricing

Same usage-based pricing as the cloud — no self-hosting premium. Daily billing options and volume discounts included.

GPU flexibility

Full GPU support for maximum performance, with options for regions with hardware import limitations.

Common questions