Terminate Streaming Session After Inactivity | AssemblyAI

An often-overlooked aspect of implementing AssemblyAI’s Streaming Speech-to-Text (STT) service is efficiently terminating transcription sessions. In this cookbook, you will learn how to terminate a Streaming session after any fixed duration of silence.

For the full code, refer to this GitHub gist.

Quickstart

1 import assemblyai as aai
2 from datetime import datetime, timedelta
3 
4 aai.settings.api_key = "YOUR_API_KEY"
5 
6 def on_open(session_opened: aai.RealtimeSessionOpened):
7   # This function is called when the connection has been established.
8 
9   print("Session ID:", session_opened.session_id)
10 
11 def on_error(error: aai.RealtimeError):
12   # This function is called when the connection has been closed.
13 
14   print("An error occured:", error)
15 
16 last_transcript_received = datetime.now()
17 terminated = False
18 
19 def on_data(transcript: aai.RealtimeTranscript):
20     global last_transcript_received
21     global terminated
22 
23     if terminated:
24         return
25 
26     if transcript.text == "":
27         # You can set the total_seconds of inactivity to be higher or lower
28         if (datetime.now() - last_transcript_received).total_seconds() > 5:
29             print("5 seconds without new transcription, terminating...")
30             terminate_transcription()
31         return
32 
33     if isinstance(transcript, aai.RealtimeFinalTranscript):
34         print(transcript.text, end="\r\n")
35     else:
36         print(transcript.text, end="\r")
37 
38     last_transcript_received = datetime.now()
39 
40 def on_close():
41     global terminated
42     if not terminated:
43         print("Closing Session")
44         terminated = True
45 
46 def terminate_transcription():
47     global terminated
48     if not terminated:
49         transcriber.close()
50         terminated = True
51 
52 # Create the Streaming STT transcriber
53 transcriber = aai.RealtimeTranscriber(
54   on_data=on_data,
55   on_error=on_error,
56   sample_rate=44_100,
57   on_open=on_open, # optional
58   on_close=on_close, # optional
59 )
60 
61 # Start the connection
62 transcriber.connect()
63 
64 # Open a microphone stream
65 microphone_stream = aai.extras.MicrophoneStream()
66 
67 # Press CTRL+C to abort
68 transcriber.stream(microphone_stream)

Get Started

Before we begin, make sure you have an AssemblyAI account and an API key. You can sign up for an AssemblyAI account and get your API key from your dashboard.

Step-by-step instructions

First, install AssemblyAI’s Python SDK.

$ pip install assemblyai

1 import assemblyai as aai
2 from datetime import datetime, timedelta
3 
4 aai.settings.api_key = "YOUR_API_KEY"

Handling inactivity

Empty transcripts

As long as a session is open, our Streaming STT service will continue sending empty PartialTranscripts that look like this:

Message 1:

{"message_type":"PartialTranscript", "created":"2023-11-10T16:10:22.754985",
"text":"", ...}

Message 2:

{"message_type":"PartialTranscript", "created":"2023-11-10T16:10:25.297511",
"text":"", ...}

Thus, we can use empty partial transcripts to assume that the user has stopped speaking.

Note: Other keys in the payload have been omitted for brevity but can be seen here in our Streaming API Reference.

Implementing Partial Transcript Checks

Let’s consider a code example to track if the PartialTranscripts have been empty for a duration of time.

Define your Streaming functions as per normal.

1 def on_open(session_opened: aai.RealtimeSessionOpened):
2   # This function is called when the connection has been established.
3 
4   print("Session ID:", session_opened.session_id)
5 
6 def on_error(error: aai.RealtimeError):
7   # This function is called when the connection has been closed.
8 
9   print("An error occured:", error)

Then, define the constant last_transcript_received = datetime.now(), and set a flag terminated to be False.

We will use these variables later on.

1 last_transcript_received = datetime.now()
2 terminated = False

Next, define your on_data function:

Access the global variable last_transcript_received, as well as terminated
If the Streaming STT transcriber has been terminated, don’t return anything.
If transcript.text is empty, check if it has been 5 seconds since the last empty transcript. When true, terminate the transcriber.
Else, just print the text in our terminal as per usual, and set the time of the last transcript received to now.

1 def on_data(transcript: aai.RealtimeTranscript):
2     global last_transcript_received
3     global terminated
4 
5     if terminated:
6         return
7 
8     if transcript.text == "":
9         # You can set the total_seconds of inactivity to be higher or lower
10         if (datetime.now() - last_transcript_received).total_seconds() > 5:
11             print("5 seconds without new transcription, terminating...")
12             terminate_transcription()
13         return
14 
15     if isinstance(transcript, aai.RealtimeFinalTranscript):
16         print(transcript.text, end="\r\n")
17     else:
18         print(transcript.text, end="\r")
19 
20     last_transcript_received = datetime.now()

Lastly, we define our on_close and terminate_transcription function. on_close simply sets terminated to true when the WebSocket connection closes.

terminate_transcription just accesses the global transcriber and closes the session when the function is called by on_data.

1 def on_close():
2     global terminated
3     if not terminated:
4         print("Closing Session")
5         terminated = True
6 
7 def terminate_transcription():
8     global terminated
9     if not terminated:
10         transcriber.close()
11         terminated = True

Create your Streaming STT transcriber and start your transcription.

1 # Create the Streaming STT transcriber
2 transcriber = aai.RealtimeTranscriber(
3   on_data=on_data,
4   on_error=on_error,
5   sample_rate=44_100,
6   on_open=on_open, # optional
7   on_close=on_close, # optional
8 )
9 
10 # Start the connection
11 transcriber.connect()
12 
13 # Open a microphone stream
14 microphone_stream = aai.extras.MicrophoneStream()
15 
16 # Press CTRL+C to abort
17 transcriber.stream(microphone_stream)

What you should observe is that transcription works in real-time and automatically terminates after 5 seconds!