Transcribe streaming audio from a microphone in Python
Learn how to transcribe streaming audio in Python.
Overview
By the end of this tutorial, you’ll be able to transcribe audio from your microphone in Python.
Supported languages
Streaming Speech-to-Text is only available for English.
Before you begin
To complete this tutorial, you need:
- Python installed.
- An AssemblyAI account with a credit card set up.
Here’s the full sample code of what you’ll build in this tutorial:
Python SDK
Python
Step 1: Install dependencies
Step 2: Configure the API key
In this step, you’ll configure your API key to authenticate with AssemblyAI.
Browse to Account, and then click Copy API key under Copy your API key.
Step 3: Set up audio configuration
Python SDK
Python
The Python SDK handles audio configuration automatically. You’ll specify the sample rate when creating the transcriber.
Audio data format
If you want to stream data from elsewhere, make sure that your audio data is in the following format:
- Single channel
- 16-bit signed integer PCM or mu-law encoding
- A sample rate that matches the value of the supplied sample_rate parameter
- 100 to 2000 milliseconds of audio per message
By default, transcriptions expect PCM16-encoded audio. If you want to use mu-law encoding, see Specifying the encoding.
Step 4: Create event handlers
In this step, you’ll set up callback functions that handle the different events.
Create another function to handle transcripts. The real-time transcriber returns two types of transcripts: Final transcripts and Partial transcripts.
- Partial transcripts are returned as the audio is being streamed to AssemblyAI.
- Final transcripts are returned after a moment of silence.
Python SDK
Python
End of utterance controls
You can configure the silence threshold for automatic utterance detection and programmatically force the end of an utterance to immediately get a Final transcript.
Step 5: Connect and start transcription
Streaming Speech-to-Text uses WebSockets to stream audio to AssemblyAI. This requires first establishing a connection to the API.
Python SDK
Python
First, create a transcriber and connect to the Realtime service:
Then, create a microphone stream and start transcribing audio:
Sample rate
The sample_rate
is the number of audio samples per second, measured in hertz (Hz). Higher sample rates result in higher quality audio, which may lead to better transcripts, but also more data being sent over the network.
We recommend the following sample rates:
- Minimum quality:
8_000
(8 kHz) - Medium quality:
16_000
(16 kHz) - Maximum quality:
48_000
(48 kHz)
Step 6: Close the connection
Next steps
To learn more about Streaming Speech-to-Text, see the following resources:
Need some help?
If you get stuck, or have any other questions, we’d love to help you out. Contact our support team at support@assemblyai.com or create a support ticket.