Transcribe System Audio in Real-Time (macOS)

This guide solves the challenge of transcribing system audio, which is can be used for transcribing media content or online calls. By using virtual audio devices, you’ll learn how to easily pipe system audio to AssemblyAI’s transcription API on both Mac and Windows.

The key to success lies in creating a virtual input device that captures your speaker output and converts it into an input stream. This approach allows you to bypass the limitations of direct system audio access.

For Mac Users: We recommend using BlackHole, a free open-source tool available through Homebrew. BlackHole creates a virtual audio device that can route your system audio to AssemblyAI’s API seamlessly. For Windows Users: Virtual Audio Cable (VAC) is a popular option. While we don’t provide specific Windows instructions in this guide, VAC offers similar functionality to BlackHole for the Windows environment.

Quickstart

1import assemblyai as aai
2import pyaudio
3import asyncio
4import sys
5
6# You'll need to install these dependencies:
7# pip install assemblyai pyaudio
8
9# Set your AssemblyAI API key
10aai.settings.api_key = "YOUR-API-KEY"
11
12def on_open(session_opened: aai.RealtimeSessionOpened):
13 print("Session ID:", session_opened.session_id)
14
15def on_data(transcript: aai.RealtimeTranscript):
16 if not transcript.text:
17 return
18 if isinstance(transcript, aai.RealtimeFinalTranscript):
19 print(transcript.text, end="\r\n")
20 else:
21 print(transcript.text, end="\r")
22
23def on_error(error: aai.RealtimeError):
24 print("An error occurred:", error)
25
26def on_close():
27 print("Closing Session")
28
29def get_blackhole_device_index():
30 p = pyaudio.PyAudio()
31 for i in range(p.get_device_count()):
32 dev_info = p.get_device_info_by_index(i)
33 if str(dev_info['name']).startswith('BlackHole'):
34 return i
35 return None
36
37blackhole_index = get_blackhole_device_index()
38
39transcriber = aai.RealtimeTranscriber(
40 sample_rate=44_100,
41 on_data=on_data,
42 on_error=on_error,
43 on_open=on_open,
44 on_close=on_close,
45 end_utterance_silence_threshold=500
46)
47
48transcriber.connect()
49
50microphone_stream = aai.extras.MicrophoneStream(sample_rate=44_100)
51microphone_stream.device_index = blackhole_index
52transcriber.stream(microphone_stream)
53
54transcriber.close()

Step-by-step guide

First, install the following packages: assemblyai, pyaudio, asyncio, sys

$pip install assemblyai pyaudio asyncio
1import assemblyai as aai
2import pyaudio
3import asyncio
4import sys

Before we begin, make sure you have an AssemblyAI account and an API key. You can sign up for a free account and get your API key from your dashboard. Please note that Streaming Speech-to-text is available for upgraded accounts only. If you’re on the free plan, you’ll need to upgrade your account by adding a credit card.

1import assemblyai as aai
2
3aai.settings.api_key = "YOUR-API-KEY"

Make sure not to share this token with anyone - it is a private key associated uniquely to your account.

Create functions to handle different events during transcription.

1def on_open(session_opened: aai.RealtimeSessionOpened):
2 print("Session ID:", session_opened.session_id)
3
4def on_data(transcript: aai.RealtimeTranscript):
5 if not transcript.text:
6 return
7 if isinstance(transcript, aai.RealtimeFinalTranscript):
8 print(transcript.text, end="\r\n")
9 else:
10 print(transcript.text, end="\r")
11
12def on_error(error: aai.RealtimeError):
13 print("An error occurred:", error)
14
15def on_close():
16 print("Closing Session")

Create a function to get the device index for your BlackHole virtual input device.

1def get_blackhole_device_index():
2 p = pyaudio.PyAudio()
3 for i in range(p.get_device_count()):
4 dev_info = p.get_device_info_by_index(i)
5 if dev_info['name'].startswith('BlackHole'):
6 return i
7 return None
8
9blackhole_index = get_blackhole_device_index()

Now we create our transcriber and MicrophoneStream, setting the BlackHole virtual device index.

1transcriber = aai.RealtimeTranscriber(
2 sample_rate=44_100,
3 on_data=on_data,
4 on_error=on_error,
5 on_open=on_open,
6 on_close=on_close,
7 end_utterance_silence_threshold=500
8)
9
10transcriber.connect()
11
12microphone_stream = aai.extras.MicrophoneStream(sample_rate=44_100)
13microphone_stream.device_index = blackhole_index
14transcriber.stream(microphone_stream)
15
16transcriber.close()

You can press Ctrl+C to stop the transcription.

Troubleshooting:

  • You need to select BlackHole as your system output device for the audio to be piped correctly

  • If you still need to hear the audio, you can create a multi-output device on Mac that sends audio to both BlackHole and your speakers/headphones Here’s how to set it up: Open “Audio MIDI Setup” (you can find this by searching in Spotlight). Click the ”+” button in the bottom left corner and choose “Create Multi-Output Device”. In the list on the right, check both your regular output (e.g., “MacBook Pro Speakers”) and “BlackHole 2ch”. Optionally, rename this new device to something like “BlackHole + Speakers”. You may need to modify your script to search for this new device.