Terminate Streaming Session After Inactivity

An often-overlooked aspect of implementing AssemblyAI’s Streaming Speech-to-Text (STT) service is efficiently terminating transcription sessions. In this cookbook, you will learn how to terminate a Streaming session after any fixed duration of silence.

For the full code, refer to this GitHub gist.

Quickstart

1import assemblyai as aai
2from datetime import datetime, timedelta
3
4aai.settings.api_key = "YOUR_API_KEY"
5
6def on_open(session_opened: aai.RealtimeSessionOpened):
7 # This function is called when the connection has been established.
8
9 print("Session ID:", session_opened.session_id)
10
11def on_error(error: aai.RealtimeError):
12 # This function is called when the connection has been closed.
13
14 print("An error occured:", error)
15
16last_transcript_received = datetime.now()
17terminated = False
18
19def on_data(transcript: aai.RealtimeTranscript):
20 global last_transcript_received
21 global terminated
22
23 if terminated:
24 return
25
26 if transcript.text == "":
27 # You can set the total_seconds of inactivity to be higher or lower
28 if (datetime.now() - last_transcript_received).total_seconds() > 5:
29 print("5 seconds without new transcription, terminating...")
30 terminate_transcription()
31 return
32
33 if isinstance(transcript, aai.RealtimeFinalTranscript):
34 print(transcript.text, end="\r\n")
35 else:
36 print(transcript.text, end="\r")
37
38 last_transcript_received = datetime.now()
39
40def on_close():
41 global terminated
42 if not terminated:
43 print("Closing Session")
44 terminated = True
45
46def terminate_transcription():
47 global terminated
48 if not terminated:
49 transcriber.close()
50 terminated = True
51
52# Create the Streaming STT transcriber
53transcriber = aai.RealtimeTranscriber(
54 on_data=on_data,
55 on_error=on_error,
56 sample_rate=44_100,
57 on_open=on_open, # optional
58 on_close=on_close, # optional
59)
60
61# Start the connection
62transcriber.connect()
63
64# Open a microphone stream
65microphone_stream = aai.extras.MicrophoneStream()
66
67# Press CTRL+C to abort
68transcriber.stream(microphone_stream)

Get Started

Before we begin, make sure you have an AssemblyAI account and an API key. You can sign up for an AssemblyAI account and get your API key from your dashboard.

Step-by-step instructions

First, install AssemblyAI’s Python SDK.

$pip install assemblyai
1import assemblyai as aai
2from datetime import datetime, timedelta
3
4aai.settings.api_key = "YOUR_API_KEY"

Handling inactivity

Empty transcripts

As long as a session is open, our Streaming STT service will continue sending empty PartialTranscripts that look like this:

Message 1:

{"message_type":"PartialTranscript", "created":"2023-11-10T16:10:22.754985",
"text":"", ...}

Message 2:

{"message_type":"PartialTranscript", "created":"2023-11-10T16:10:25.297511",
"text":"", ...}

Thus, we can use empty partial transcripts to assume that the user has stopped speaking.

Note: Other keys in the payload have been omitted for brevity but can be seen here in our Streaming API Reference.

Implementing Partial Transcript Checks

Let’s consider a code example to track if the PartialTranscripts have been empty for a duration of time.

Define your Streaming functions as per normal.

1def on_open(session_opened: aai.RealtimeSessionOpened):
2 # This function is called when the connection has been established.
3
4 print("Session ID:", session_opened.session_id)
5
6def on_error(error: aai.RealtimeError):
7 # This function is called when the connection has been closed.
8
9 print("An error occured:", error)

Then, define the constant last_transcript_received = datetime.now(), and set a flag terminated to be False.

We will use these variables later on.

1last_transcript_received = datetime.now()
2terminated = False

Next, define your on_data function:

  • Access the global variable last_transcript_received, as well as terminated
  • If the Streaming STT transcriber has been terminated, don’t return anything.
  • If transcript.text is empty, check if it has been 5 seconds since the last empty transcript. When true, terminate the transcriber.
  • Else, just print the text in our terminal as per usual, and set the time of the last transcript received to now.
1def on_data(transcript: aai.RealtimeTranscript):
2 global last_transcript_received
3 global terminated
4
5 if terminated:
6 return
7
8 if transcript.text == "":
9 # You can set the total_seconds of inactivity to be higher or lower
10 if (datetime.now() - last_transcript_received).total_seconds() > 5:
11 print("5 seconds without new transcription, terminating...")
12 terminate_transcription()
13 return
14
15 if isinstance(transcript, aai.RealtimeFinalTranscript):
16 print(transcript.text, end="\r\n")
17 else:
18 print(transcript.text, end="\r")
19
20 last_transcript_received = datetime.now()

Lastly, we define our on_close and terminate_transcription function. on_close simply sets terminated to true when the WebSocket connection closes.

terminate_transcription just accesses the global transcriber and closes the session when the function is called by on_data.

1def on_close():
2 global terminated
3 if not terminated:
4 print("Closing Session")
5 terminated = True
6
7def terminate_transcription():
8 global terminated
9 if not terminated:
10 transcriber.close()
11 terminated = True

Create your Streaming STT transcriber and start your transcription.

1# Create the Streaming STT transcriber
2transcriber = aai.RealtimeTranscriber(
3 on_data=on_data,
4 on_error=on_error,
5 sample_rate=44_100,
6 on_open=on_open, # optional
7 on_close=on_close, # optional
8)
9
10# Start the connection
11transcriber.connect()
12
13# Open a microphone stream
14microphone_stream = aai.extras.MicrophoneStream()
15
16# Press CTRL+C to abort
17transcriber.stream(microphone_stream)

What you should observe is that transcription works in real-time and automatically terminates after 5 seconds!