Apply Noise Reduction to Audio for Streaming Speech-to-Text

This guide demonstrates how to implement a noise reduction system for real-time audio transcription using AssemblyAI’s Streaming STT and the noisereduce library. You’ll learn how to create a custom audio stream that preprocesses incoming audio to remove background noise before it reaches the transcription service.

This solution is particularly valuable for:

  • Voice assistants operating in noisy environments
  • Customer service applications processing calls
  • Meeting transcription tools
  • Voice-enabled applications requiring high accuracy

The implementation uses Python and combines proven audio processing techniques with AssemblyAI’s powerful transcription capabilities. While our example focuses on microphone input, the principles can be applied to any real-time audio stream.

Quickstart

1import assemblyai as aai
2import noisereduce as nr
3import numpy as np
4
5import assemblyai as aai
6
7aai.settings.api_key = "YOUR-API-KEY"
8
9def on_open(session_opened: aai.RealtimeSessionOpened):
10 print("Session ID:", session_opened.session_id)
11
12def on_data(transcript: aai.RealtimeTranscript):
13 if not transcript.text:
14 return
15 if isinstance(transcript, aai.RealtimeFinalTranscript):
16 print(transcript.text, end="\r\n")
17 else:
18 print(transcript.text, end="\r")
19
20def on_error(error: aai.RealtimeError):
21 print("An error occurred:", error)
22
23def on_close():
24 print("Closing Session")
25
26class NoiseReducedMicrophoneStream:
27 def __init__(self, sample_rate):
28 self.microphone_stream = aai.extras.MicrophoneStream(sample_rate=sample_rate)
29 self.sample_rate = sample_rate
30 self.buffer = np.array([])
31 self.buffer_size = int(sample_rate * 0.5) # 0.5 seconds buffer
32
33 def __iter__(self):
34 return self
35
36 def __next__(self):
37 # Get audio chunk from microphone
38 audio_chunk = next(self.microphone_stream)
39
40 # Convert bytes to numpy array
41 audio_data = np.frombuffer(audio_chunk, dtype=np.int16)
42
43 # Add to buffer
44 self.buffer = np.append(self.buffer, audio_data)
45
46 # Process when buffer is full
47 if len(self.buffer) >= self.buffer_size:
48 # Convert to float32 for noise reduction
49 float_buffer = self.buffer.astype(np.float32) / 32768.0
50
51 # Apply noise reduction
52 # You can tweak these parameters to change the aggressiveness of the noise reduction
53 reduced_noise = nr.reduce_noise(
54 y=float_buffer,
55 sr=self.sample_rate,
56 prop_decrease=0.75,
57 n_fft=1024
58 )
59
60 # Convert back to int16
61 processed_chunk = (reduced_noise * 32768.0).astype(np.int16)
62
63 # Clear buffer but keep a small overlap
64 overlap = 1024
65 self.buffer = self.buffer[-overlap:] if len(self.buffer) > overlap else np.array([])
66
67 # Convert back to bytes
68 return processed_chunk.tobytes()
69
70 # If buffer not full, return empty bytes
71 return b''
72
73
74transcriber = aai.RealtimeTranscriber(
75 sample_rate=16_000,
76 on_data=on_data,
77 on_error=on_error,
78 on_open=on_open,
79 on_close=on_close,
80)
81
82transcriber.connect()
83
84# Use the noise-reduced stream instead of the regular microphone stream
85noise_reduced_stream = NoiseReducedMicrophoneStream(sample_rate=16_000)
86transcriber.stream(noise_reduced_stream)
87
88transcriber.close()

Step-by-step guide

First, install the following packages: assemblyai, noisereduce, numpy

$pip install assemblyai noisereduce numpy
1import assemblyai as aai
2import noisereduce as nr
3import numpy as np

Before we begin, make sure you have an AssemblyAI account and an API key. You can sign up for a free account and get your API key from your dashboard. Please note that Streaming Speech-to-text is available for upgraded accounts only. If you’re on the free plan, you’ll need to upgrade your account by adding a credit card.

1import assemblyai as aai
2
3aai.settings.api_key = "YOUR-API-KEY"

Make sure not to share this token with anyone - it is a private key associated uniquely to your account.

Create functions to handle different events during transcription.

1def on_open(session_opened: aai.RealtimeSessionOpened):
2 print("Session ID:", session_opened.session_id)
3
4def on_data(transcript: aai.RealtimeTranscript):
5 if not transcript.text:
6 return
7 if isinstance(transcript, aai.RealtimeFinalTranscript):
8 print(transcript.text, end="\r\n")
9 else:
10 print(transcript.text, end="\r")
11
12def on_error(error: aai.RealtimeError):
13 print("An error occurred:", error)
14
15def on_close():
16 print("Closing Session")

Create a custom stream class that includes noise reduction.

1class NoiseReducedMicrophoneStream:
2 def __init__(self, sample_rate):
3 self.microphone_stream = aai.extras.MicrophoneStream(sample_rate=sample_rate)
4 self.sample_rate = sample_rate
5 self.buffer = np.array([])
6 self.buffer_size = int(sample_rate * 0.5) # 0.5 seconds buffer
7
8 def __iter__(self):
9 return self
10
11 def __next__(self):
12 # Get audio chunk from microphone
13 audio_chunk = next(self.microphone_stream)
14
15 # Convert bytes to numpy array
16 audio_data = np.frombuffer(audio_chunk, dtype=np.int16)
17
18 # Add to buffer
19 self.buffer = np.append(self.buffer, audio_data)
20
21 # Process when buffer is full
22 if len(self.buffer) >= self.buffer_size:
23 # Convert to float32 for noise reduction
24 float_buffer = self.buffer.astype(np.float32) / 32768.0
25
26 # Apply noise reduction
27 # You can tweak these parameters to change the aggressiveness of the noise reduction
28 reduced_noise = nr.reduce_noise(
29 y=float_buffer,
30 sr=self.sample_rate,
31 prop_decrease=0.75,
32 n_fft=1024
33 )
34
35 # Convert back to int16
36 processed_chunk = (reduced_noise * 32768.0).astype(np.int16)
37
38 # Clear buffer but keep a small overlap
39 overlap = 1024
40 self.buffer = self.buffer[-overlap:] if len(self.buffer) > overlap else np.array([])
41
42 # Convert back to bytes
43 return processed_chunk.tobytes()
44
45 # If buffer not full, return empty bytes
46 return b''

Now we create our transcriber and NoiseReducedMicrophoneStream.

1transcriber = aai.RealtimeTranscriber(
2 sample_rate=16_000,
3 on_data=on_data,
4 on_error=on_error,
5 on_open=on_open,
6 on_close=on_close,
7)
8
9transcriber.connect()
10
11# Use the noise-reduced stream instead of the regular microphone stream
12noise_reduced_stream = NoiseReducedMicrophoneStream(sample_rate=16_000)
13transcriber.stream(noise_reduced_stream)
14
15transcriber.close()

You can press Ctrl+C to stop the transcription.

Was this page helpful?
Built with