Transcribe streaming audio from a microphone in Python | AssemblyAI

Overview

By the end of this tutorial, you’ll be able to transcribe audio from your microphone in Python.

Supported languages

Streaming Speech-to-Text is only available for English.

Before you begin

To complete this tutorial, you need:

Python installed.
An AssemblyAI account with a credit card set up.

Here’s the full sample code of what you’ll build in this tutorial:

Python SDK

Python

1 import assemblyai as aai
2 
3 aai.settings.api_key = "<YOUR_API_KEY>"
4 
5 def on_open(session_opened: aai.RealtimeSessionOpened):
6     print("Session ID:", session_opened.session_id)
7 
8 def on_data(transcript: aai.RealtimeTranscript):
9     if not transcript.text:
10         return
11     if isinstance(transcript, aai.RealtimeFinalTranscript):
12         print(transcript.text, end="\r\n")
13     else:
14         print(transcript.text, end="\r")
15 
16 def on_error(error: aai.RealtimeError):
17     print("An error occurred:", error)
18 
19 def on_close():
20     print("Closing Session")
21 
22 transcriber = aai.RealtimeTranscriber(
23     sample_rate=16_000,
24     on_data=on_data,
25     on_error=on_error,
26     on_open=on_open,
27     on_close=on_close,
28 )
29 
30 transcriber.connect()
31 
32 microphone_stream = aai.extras.MicrophoneStream(sample_rate=16_000)
33 transcriber.stream(microphone_stream)
34 
35 transcriber.close()

Step 1: Install dependencies

Python SDK

Python

First, install PortAudio, a cross-platform library for streaming audio. The Python SDK uses PortAudio to stream audio from your microphone.

$ # (Mac)
> brew install portaudio
> 
> # (Windows)
> # PortAudio is already installed on most versions of Windows.
> 
> # (Linux)
> apt install portaudio19-dev

Then install the AssemblyAI Python SDK with extras enabled for microphone support:

$ pip install "assemblyai[extras]"

Step 2: Configure the API key

In this step, you’ll configure your API key to authenticate with AssemblyAI.

Browse to API Keys in your dashboard, and then copy your API key.

Python SDK

Python

Configure the SDK to use your API key. Replace YOUR_API_KEY with your copied API key.

1 import assemblyai as aai
2 
3 aai.settings.api_key = "<YOUR_API_KEY>"

Step 3: Set up audio configuration

Python SDK

Python

The Python SDK handles audio configuration automatically. You’ll specify the sample rate when creating the transcriber.

Audio data format

If you want to stream data from elsewhere, make sure that your audio data is in the following format:

Single channel
16-bit signed integer PCM or mu-law encoding
A sample rate that matches the value of the supplied sample_rate parameter
100 to 2000 milliseconds of audio per message

By default, transcriptions expect PCM16-encoded audio. If you want to use mu-law encoding, see Specifying the encoding.

Step 4: Create event handlers

In this step, you’ll set up callback functions that handle the different events.

Create functions to handle the events from the real-time service.

Python SDK

Python

1 def on_open(session_opened: aai.RealtimeSessionOpened):
2     print("Session ID:", session_opened.session_id)
3 
4 def on_error(error: aai.RealtimeError):
5     print("An error occurred:", error)
6 
7 def on_close():
8     print("Closing Session")

Create another function to handle transcripts. The real-time transcriber returns two types of transcripts: Final transcripts and Partial transcripts.

Partial transcripts are returned as the audio is being streamed to AssemblyAI.
Final transcripts are returned after a moment of silence.

Python SDK

Python

1 def on_data(transcript: aai.RealtimeTranscript):
2     if not transcript.text:
3         return
4 
5     if isinstance(transcript, aai.RealtimeFinalTranscript):
6         # Add new line after final transcript.
7         print(transcript.text, end="\r\n")
8     else:
9         print(transcript.text, end="\r")

End of utterance controls

You can configure the silence threshold for automatic utterance detection and programmatically force the end of an utterance to immediately get a Final transcript.

Step 5: Connect and start transcription

Streaming Speech-to-Text uses WebSockets to stream audio to AssemblyAI. This requires first establishing a connection to the API.

Python SDK

Python

First, create a transcriber and connect to the real-time service:

1 transcriber = aai.RealtimeTranscriber(
2     sample_rate=16_000,
3     on_data=on_data,
4     on_error=on_error,
5     on_open=on_open,
6     on_close=on_close,
7 )
8 
9 transcriber.connect()

Then, create a microphone stream and start transcribing audio:

1 microphone_stream = aai.extras.MicrophoneStream(sample_rate=16_000)
2 transcriber.stream(microphone_stream)  # Press Ctrl+C to stop

Sample rate

The sample_rate is the number of audio samples per second, measured in hertz (Hz). Higher sample rates result in higher quality audio, which may lead to better transcripts, but also more data being sent over the network.

We recommend the following sample rates:

Minimum quality: 8_000 (8 kHz)
Medium quality: 16_000 (16 kHz)
Maximum quality: 48_000 (48 kHz)

Step 6: Close the connection

Python SDK

Python

Close the transcriber when you’re done:

1 transcriber.close()

The connection will also close automatically when you press Ctrl+C. In both cases, the on_close handler will clean up the audio resources.

Next steps

To learn more about Streaming Speech-to-Text, see the following resources:

Need some help?

If you get stuck, or have any other questions, we’d love to help you out. Contact our support team at support@assemblyai.com or create a support ticket.