Capture Complete Sentences as Partial Transcriptions with Streaming Speech-To-Text | AssemblyAI

To effectively use AssemblyAI’s Streaming Speech-to-Text (STT) API for partial transcripts, particularly in scenarios where final transcripts (which include punctuation and casing) are not required, you need to understand how partial transcripts work and how to handle them in your application. Here’s a guide to help you get started.

Quickstart

1 import assemblyai as aai
2 
3 aai.settings.api_key = "YOUR-API-KEY"
4 
5 
6 def on_open(session_opened: aai.RealtimeSessionOpened):
7   "This function is called when the connection has been established."
8 
9   print("Session ID:", session_opened.session_id)
10 
11 def on_data(transcript: aai.RealtimeTranscript):
12   "This function is called when a new transcript has been received."
13 
14   global partial_transcript
15 
16   if not transcript.text:
17     return
18 
19   if isinstance(transcript, aai.RealtimeFinalTranscript):
20     partial_transcript = ""
21   elif partial_transcript == transcript.text:
22     print(transcript.text, end="\r\n")
23   else:
24     partial_transcript = transcript.text
25 
26 def on_error(error: aai.RealtimeError):
27   "This function is called when the connection has been closed."
28 
29   print("An error occured:", error)
30 
31 def on_close():
32   "This function is called when the connection has been closed."
33 
34   print("Closing Session")
35 
36 
37 # Create the Real-Time transcriber
38 transcriber = aai.RealtimeTranscriber(
39   on_data=on_data,
40   on_error=on_error,
41   sample_rate=44_100,
42   on_open=on_open, # optional
43   on_close=on_close, # optional
44 )
45 
46 
47 partial_transcript = ""
48 # Start the connection
49 transcriber.connect()
50 
51 # Open a microphone stream
52 microphone_stream = aai.extras.MicrophoneStream()
53 
54 # Press CTRL+C to abort
55 transcriber.stream(microphone_stream)
56 
57 transcriber.close()

Step-by-step guide

First, install AssemblyAI’s Python SDK.

$ pip install "assemblyai[all]"

Before we begin, make sure you have an AssemblyAI account and an API key. You can sign up for a free account and get your API key from your dashboard. Please note that this feature is available for paid accounts only. If you’re on the free plan, you’ll need to upgrade.

1 import assemblyai as aai
2 
3 aai.settings.api_key = "YOUR-API-KEY"

Understanding Partial Transcripts

What are Partial Transcripts?

Partial transcripts are incomplete and ongoing transcriptions of an audio stream. They provide a near real-time text representation of spoken words before the entire speech is finished.

They are useful in scenarios where immediate text feedback is more important than the complete accuracy or formatting of the final transcript.

Example Use Cases where Partial Transcripts suffice

Chat bots processed using LLMs
Voice Command Recognition
Real-time Translations

What do Partial Transcripts look like?

For a sentence such as “What is the capital of New Zealand”, these are the messages you would receive from our API.

Message 1:

{"message_type":"PartialTranscript", "created":"2023-11-10T16:10:22.754985",
"text":"what is the", ...}

Message 2:

{"message_type":"PartialTranscript", "created":"2023-11-10T16:10:23.297511",
"text":"what is the capital of", ...}

Message 3:

{"message_type":"PartialTranscript", "created":"2023-11-10T16:10:24.113527",
"text":"what is the capital of new zealand", ...}

Message 4 (Notice how the text is the exact same as in Message 3!):

{"message_type":"PartialTranscript", "created":"2023-11-10T16:10:24.67045",
"text":"what is the capital of new zealand", ...}

Message 5:

{"message_type":"FinalTranscript", "created":"2023-11-10T16:10:24.9708",
"text":"What is the capital of New Zealand?", ...}

Notice that after the text in Messages 3 and 4 are the exact same, a Final Transcript is triggered. Instead, we can programmatically check if the text in a given Message matches the text from a previous Message, and then use that to deduce that the transcript is complete.

Note: Other keys in the payload have been omitted for brevity but can be seen here in our Streaming API Reference.

Implementing Partial Transcript Checks

Let’s consider a code example to check if the partial transcript received from AssemblyAI matches the previous partial transcript.

Define your Streaming functions as per normal.

1 def on_open(session_opened: aai.RealtimeSessionOpened):
2   "This function is called when the connection has been established."
3 
4   print("Session ID:", session_opened.session_id)
5 
6 def on_error(error: aai.RealtimeError):
7   "This function is called when the connection has been closed."
8 
9   print("An error occured:", error)
10 
11 def on_close():
12   "This function is called when the connection has been closed."
13 
14   print("Closing Session")

Then, define an empty string for partial_transcript. In on_data(), we will do 3 things:

Access the global string partial_transcript
If the data received is a Final Transcript, reset partial_transcript.
Else, if the transcript.text matches the previous partial_transcript, print it to our terminal
Otherwise, set partial_transcript to be the Partial Transcript received from AssemblyAI.

1 partial_transcript = ''
2 
3 def on_data(transcript: aai.RealtimeTranscript):
4   "This function is called when a new transcript has been received."
5 
6   global partial_transcript
7 
8   if not transcript.text:
9     return
10 
11   if isinstance(transcript, aai.RealtimeFinalTranscript):
12     partial_transcript = ""
13   elif partial_transcript == transcript.text:
14     print(transcript.text, end="\r\n")
15   else:
16     partial_transcript = transcript.text

Create your Streaming transcriber and start your transcription.

1 # Create the Streaming transcriber
2 transcriber = aai.RealtimeTranscriber(
3   on_data=on_data,
4   on_error=on_error,
5   sample_rate=44_100,
6   on_open=on_open, # optional
7   on_close=on_close, # optional
8 )
9 
10 # Start the connection
11 transcriber.connect()
12 
13 # Open a microphone stream
14 microphone_stream = aai.extras.MicrophoneStream()
15 
16 # Press CTRL+C to abort
17 transcriber.stream(microphone_stream)
18 
19 transcriber.close()

What you should observe is that partial transcripts are printed to the terminal within 500ms of being spoken. By following these guidelines and understanding how to handle Partial Transcripts, you can effectively integrate AssemblyAI’s Streaming STT into your application for scenarios where immediate text feedback is crucial, even without the finesse of Final Transcripts.