Capture Complete Sentences as Partial Transcriptions with Streaming Speech-To-Text
To effectively use AssemblyAI’s Streaming Speech-to-Text (STT) API for partial transcripts, particularly in scenarios where final transcripts (which include punctuation and casing) are not required, you need to understand how partial transcripts work and how to handle them in your application. Here’s a guide to help you get started.
Quickstart
Step-by-step guide
First, install AssemblyAI’s Python SDK.
Before we begin, make sure you have an AssemblyAI account and an API key. You can sign up for a free account and get your API key from your dashboard. Please note that this feature is available for paid accounts only. If you’re on the free plan, you’ll need to upgrade.
Understanding Partial Transcripts
What are Partial Transcripts?
Partial transcripts are incomplete and ongoing transcriptions of an audio stream. They provide a near real-time text representation of spoken words before the entire speech is finished.
They are useful in scenarios where immediate text feedback is more important than the complete accuracy or formatting of the final transcript.
Example Use Cases where Partial Transcripts suffice
- Chat bots processed using LLMs
- Voice Command Recognition
- Real-time Translations
What do Partial Transcripts look like?
For a sentence such as “What is the capital of New Zealand”, these are the messages you would receive from our API.
Message 1:
Message 2:
Message 3:
Message 4 (Notice how the text is the exact same as in Message 3!):
Message 5:
Notice that after the text in Messages 3 and 4 are the exact same, a Final Transcript is triggered. Instead, we can programmatically check if the text in a given Message matches the text from a previous Message, and then use that to deduce that the transcript is complete.
Note: Other keys in the payload have been omitted for brevity but can be seen here in our Streaming API Reference.
Implementing Partial Transcript Checks
Let’s consider a code example to check if the partial transcript received from AssemblyAI matches the previous partial transcript.
Define your Streaming functions as per normal.
Then, define an empty string for partial_transcript
. In on_data(), we will do 3 things:
- Access the global string
partial_transcript
- If the data received is a Final Transcript, reset
partial_transcript
. - Else, if the
transcript.text
matches the previouspartial_transcript
, print it to our terminal - Otherwise, set
partial_transcript
to be the Partial Transcript received from AssemblyAI.
Create your Streaming transcriber and start your transcription.
What you should observe is that partial transcripts are printed to the terminal within 500ms of being spoken. By following these guidelines and understanding how to handle Partial Transcripts, you can effectively integrate AssemblyAI’s Streaming STT into your application for scenarios where immediate text feedback is crucial, even without the finesse of Final Transcripts.