Do you offer voice-to-voice or text-to-speech (TTS)?

Yes! The Voice Agent API provides a complete voice-to-voice pipeline through a single WebSocket connection. It combines AssemblyAI’s speech-to-text, LLM reasoning, and text-to-speech into one integrated service — you stream audio in and receive spoken audio back in real time. The Voice Agent API is billed at a single all-in rate of $4.50/hr covering STT, LLM reasoning, and TTS. See the Voice Agent API documentation to get started.

AssemblyAI does not offer standalone text-to-speech as a separate service. TTS is available as part of the Voice Agent API pipeline.

Do you offer translation?

Does it cost extra to export SRT or VTT captions?

⌘I