Multichannel Transcription
Supported Languages, Regions, and Models
Multichannel transcription is supported for all languages, regions, and models.
If you have a multichannel audio file with multiple speakers, you can transcribe each of them separately.
The response includes an audio_channels property with the number of different channels, and an additional utterances property, containing a list of turn-by-turn utterances.
Each utterance contains channel information, starting at 1.
Additionally, each word in the words array contains the channel identifier.
Quickstart
Multichannel audio increases the transcription time by approximately 25%.
Per-channel diarization
If you have a multichannel audio file where individual channels may contain multiple speakers, you can combine multichannel and speaker_labels to perform diarization within each channel.
When both parameters are enabled:
- Channels are labeled numerically (1, 2, 3, etc.)
- Speakers within each channel are labeled alphabetically (A, B, C, etc.)
- The combined speaker label format is
{channel}{speaker}(e.g., “1A”, “1B”, “2A”)
For example, if channel 1 has two speakers and channel 2 has one speaker, the labels would be:
- First speaker on channel 1:
1A - Second speaker on channel 1:
1B - First speaker on channel 2:
2A