Capture Complete Sentences as Partial Transcriptions with Streaming Speech-To-Text

To effectively use AssemblyAI’s Streaming Speech-to-Text (STT) API for partial transcripts, particularly in scenarios where final transcripts (which include punctuation and casing) are not required, you need to understand how partial transcripts work and how to handle them in your application. Here’s a guide to help you get started.

Quickstart

1import assemblyai as aai
2
3aai.settings.api_key = "YOUR-API-KEY"
4
5
6def on_open(session_opened: aai.RealtimeSessionOpened):
7 "This function is called when the connection has been established."
8
9 print("Session ID:", session_opened.session_id)
10
11def on_data(transcript: aai.RealtimeTranscript):
12 "This function is called when a new transcript has been received."
13
14 global partial_transcript
15
16 if not transcript.text:
17 return
18
19 if isinstance(transcript, aai.RealtimeFinalTranscript):
20 partial_transcript = ""
21 elif partial_transcript == transcript.text:
22 print(transcript.text, end="\r\n")
23 else:
24 partial_transcript = transcript.text
25
26def on_error(error: aai.RealtimeError):
27 "This function is called when the connection has been closed."
28
29 print("An error occured:", error)
30
31def on_close():
32 "This function is called when the connection has been closed."
33
34 print("Closing Session")
35
36
37# Create the Real-Time transcriber
38transcriber = aai.RealtimeTranscriber(
39 on_data=on_data,
40 on_error=on_error,
41 sample_rate=44_100,
42 on_open=on_open, # optional
43 on_close=on_close, # optional
44)
45
46
47partial_transcript = ""
48# Start the connection
49transcriber.connect()
50
51# Open a microphone stream
52microphone_stream = aai.extras.MicrophoneStream()
53
54# Press CTRL+C to abort
55transcriber.stream(microphone_stream)
56
57transcriber.close()

Step-by-step guide

First, install AssemblyAI’s Python SDK.

$pip install "assemblyai[all]"

Before we begin, make sure you have an AssemblyAI account and an API key. You can sign up for a free account and get your API key from your dashboard. Please note that this feature is available for paid accounts only. If you’re on the free plan, you’ll need to upgrade.

1import assemblyai as aai
2
3aai.settings.api_key = "YOUR-API-KEY"

Understanding Partial Transcripts

What are Partial Transcripts?

Partial transcripts are incomplete and ongoing transcriptions of an audio stream. They provide a near real-time text representation of spoken words before the entire speech is finished.

They are useful in scenarios where immediate text feedback is more important than the complete accuracy or formatting of the final transcript.

Example Use Cases where Partial Transcripts suffice

  • Chat bots processed using LLMs
  • Voice Command Recognition
  • Real-time Translations

What do Partial Transcripts look like?

For a sentence such as “What is the capital of New Zealand”, these are the messages you would receive from our API.

Message 1:

{"message_type":"PartialTranscript", "created":"2023-11-10T16:10:22.754985",
"text":"what is the", ...}

Message 2:

{"message_type":"PartialTranscript", "created":"2023-11-10T16:10:23.297511",
"text":"what is the capital of", ...}

Message 3:

{"message_type":"PartialTranscript", "created":"2023-11-10T16:10:24.113527",
"text":"what is the capital of new zealand", ...}

Message 4 (Notice how the text is the exact same as in Message 3!):

{"message_type":"PartialTranscript", "created":"2023-11-10T16:10:24.67045",
"text":"what is the capital of new zealand", ...}

Message 5:

{"message_type":"FinalTranscript", "created":"2023-11-10T16:10:24.9708",
"text":"What is the capital of New Zealand?", ...}

Notice that after the text in Messages 3 and 4 are the exact same, a Final Transcript is triggered. Instead, we can programmatically check if the text in a given Message matches the text from a previous Message, and then use that to deduce that the transcript is complete.

Note: Other keys in the payload have been omitted for brevity but can be seen here in our Streaming API Reference.

Implementing Partial Transcript Checks

Let’s consider a code example to check if the partial transcript received from AssemblyAI matches the previous partial transcript.

Define your Streaming functions as per normal.

1def on_open(session_opened: aai.RealtimeSessionOpened):
2 "This function is called when the connection has been established."
3
4 print("Session ID:", session_opened.session_id)
5
6def on_error(error: aai.RealtimeError):
7 "This function is called when the connection has been closed."
8
9 print("An error occured:", error)
10
11def on_close():
12 "This function is called when the connection has been closed."
13
14 print("Closing Session")

Then, define an empty string for partial_transcript. In on_data(), we will do 3 things:

  • Access the global string partial_transcript
  • If the data received is a Final Transcript, reset partial_transcript.
  • Else, if the transcript.text matches the previous partial_transcript, print it to our terminal
  • Otherwise, set partial_transcript to be the Partial Transcript received from AssemblyAI.
1partial_transcript = ''
2
3def on_data(transcript: aai.RealtimeTranscript):
4 "This function is called when a new transcript has been received."
5
6 global partial_transcript
7
8 if not transcript.text:
9 return
10
11 if isinstance(transcript, aai.RealtimeFinalTranscript):
12 partial_transcript = ""
13 elif partial_transcript == transcript.text:
14 print(transcript.text, end="\r\n")
15 else:
16 partial_transcript = transcript.text

Create your Streaming transcriber and start your transcription.

1# Create the Streaming transcriber
2transcriber = aai.RealtimeTranscriber(
3 on_data=on_data,
4 on_error=on_error,
5 sample_rate=44_100,
6 on_open=on_open, # optional
7 on_close=on_close, # optional
8)
9
10# Start the connection
11transcriber.connect()
12
13# Open a microphone stream
14microphone_stream = aai.extras.MicrophoneStream()
15
16# Press CTRL+C to abort
17transcriber.stream(microphone_stream)
18
19transcriber.close()

What you should observe is that partial transcripts are printed to the terminal within 500ms of being spoken. By following these guidelines and understanding how to handle Partial Transcripts, you can effectively integrate AssemblyAI’s Streaming STT into your application for scenarios where immediate text feedback is crucial, even without the finesse of Final Transcripts.

Was this page helpful?
Built with