March 9, 2022

Auto-Tweet Your Words Using Speech Recognition in Python

We say the funniest things when no one is listening. But what if someone did, all the time? In this article, we will learn how to make an app that will listen to you all the time and Tweet the funniest, smartest or most relatable things you say out loud.

Tutorial

Python

Mısra Turp

Developer Educator

Mısra Turp

Developer Educator

Reviewed by

No items found.

Table of contents

[Visible on live site]

Get $50 in credits

We say the funniest things when no one is listening. But what if someone did, all the time? In this article, we will learn how to make an app that will listen to you and tweet the funniest, smartest or most relatable things you say out loud.

The app will work by listening to you and transcribing your sentences. After you say something you would like to tweet, you can say the keyword "tweet" and it will post your latest sentence on Twitter.

We will make the app with Python. The main libraries will be:

PyAudio for listening to the input source
Twython for easy use of the Twitter API
AssemblyAI for Speech-to-Text transcription

#Setting up the dependencies

Before coding at all, we need Twitter and AssemblyAI credentials. Getting an AssemblyAI API token is very simple. Just sign up for AssemblyAI and log in to find your token. If you have never used AssemblyAI before, you can get a free API token.

Get your Free API token

In order to use the Twitter API, go to Twitter Developer Portal and create an account. After providing some information to Twitter, you need to create a project and get the necessary credentials. For this project, you need read and write permissions.

There will be two files in this project. The main Python script and a configuration file. Fill in your configuration file with the authentication key from AssemblyAI and other credentials from Twitter like so:

auth_key = '' consumer_key = '' consumer_secret = '' access_token = '' access_token_secret = ''

In the main script, we start by importing all the libraries we need.

importwebsocketsimportasyncioimportbase64importjsonimportpyaudiofrom twython importTwythonfrom configure import *

#Listening with the microphone

Next up is setting up the parameters of PyAudio and starting a stream.

FRAMES_PER_BUFFER = 3200 FORMAT = pyaudio.paInt16 CHANNELS = 1 RATE = 16000 p = pyaudio.PyAudio()# starts recordingstream = p.open( format=FORMAT, channels=CHANNELS, rate=RATE, input=True, frames_per_buffer=FRAMES_PER_BUFFER )

The main point to pay attention to here is the RATE parameter. When setting up the connection to an AssemblyAI endpoint, the same sample rate needs to be specified. Since in this project sentences will be transcribed in real-time, we will use AssemblyAI's real-time endpoint.

URL = "wss://api.assemblyai.com/v2/realtime/ws?sample_rate=16000"

#Connecting to Twitter

As mentioned above, we are using Twython to set up a connection to Twitter API. For this, only the credentials from the configure.py file are needed.

twitter = Twython( consumer_key, consumer_secret, access_token, access_token_secret )

The next step in building the auto-tweeter app is to set up the asynchronous behavior of constantly listening and having sentences transcribed. For this, Python's asyncio library will be used.

#Sending audio to AssemblyAI

We have one function for listening and transcribing each. The listening function will be called send. Its main goal is to capture audio and send it to AssemblyAI's Speech-to-Text API. The transcribing function will be called receive and its primary goal is to constantly listen to AssemblyAI's endpoint to get transcribed audio results.

Let's take a look at the listening function first. The function is going to be running indefinitely to send whatever is spoken into the microphone to AssemblyAI. Thus the while True line. The actual functionality of the function is wrapped in try and except blocks to catch any potential errors. The four lines inside the try block perform the main functionality of this function.

async def send(): whileTrue: try: data = stream.read(FRAMES_PER_BUFFER) data = base64.b64encode(data).decode("utf-8") json_data = json.dumps({"audio_data":str(data)}) r =await_ws.send(json_data) except websockets.exceptions.ConnectionClosedError ase: print(e) asserte.code == 4008 except Exception ase: print(e) assertFalse, "Not a websocket 4008 error" r =await asyncio.sleep(0.01)

What send function does is quite simple. After capturing the audio, the function encodes it into the necessary format and sends it to AssemblyAI.

#Interpreting the transcription

The Receive function on the other hand, as the name suggests, receives the results from AssemblyAI.

async def receive(): whileTrue: try: result_str =await_ws.recv() result = json.loads(result_str)['text'] ifjson.loads(result_str)['message_type']=='FinalTranscript': print(result) if result == 'Tweet.' andprevious_result!='': twitter.update_status(status=previous_result) print("Tweeted: %s" % previous_result) previous_result = result except websockets.exceptions.ConnectionClosedError ase: print(e) asserte.code == 4008 except Exception ase: print(e) assert False, "Not a websocket 4008 err

Let's take it step by step and decipher this block of code. The "result_str" variable has the responses from AssemblyAI. Here is what it looks like:

The response has complete information on this transcription, including the audio start and end timestamps, the confidence of the transcription and the resulting text. One critical attribute that is very helpful is "message_type" all the way at the end of the response.

As things currently stand, AssemblyAI will be sending back unfinished sentences or "PartialTranscripts" since we are using the real-time endpoint. After AssemblyAI acknowledges the ending of a sentence, it will add the correct punctuation and complete the sentence by capitalizing letters where necessary. We want to only post full sentences to Twitter and not partial transcripts. That's why the responses are filtered using the line:

if json.loads(result_str)['message_type']=='FinalTranscript':

#Tweeting transcribed sentences

The rest of the lines deal with the tweeting of the sentences. The previous timestep's transcription is kept in mind by assigning it to the "previous_result" variable. And whenever the resulting transcript of the current timestep is "Tweet." we post the last sentence with the line:

twitter.update_status(status=previous_result)

#Establishing asynchronous behavior

These two functions (send and receive) will be wrapped in another function to be able to run them asynchronously.

async def send_receive(): print(f'Connecting to url ${URL}') async withwebsockets.connect( URL, extra_headers=(("Authorization", auth_key),), ping_interval=5, ping_timeout=20 )as_ws: r =awaitasyncio.sleep(0.1) print("Receiving SessionBegins ...") session_begins =await_ws.recv() print(session_begins) print("Sending messages ...") result = '' async def send(): whileTrue: ... async def receive(): whileTrue: ... send_result, receive_result =await asyncio.gather(send(), receive())

Other than wrapping the send and receive functions, this function also makes the connection to AssemblyAI using websockets. And after defining the functions, it calls the send and receive functions to run at the same time.

Of course, after defining this function, we need to call it at the end of the script. Here is the line to do that:

asyncio.run(send_receive())

You can find the code on GitHub.

Prefer to watch this tutorial? Find the video here:

‍

Auto-Tweet Your Words Using Speech Recognition in Python

#Setting up the dependencies

#Listening with the microphone

#Connecting to Twitter

#Sending audio to AssemblyAI

#Interpreting the transcription

#Tweeting transcribed sentences

#Establishing asynchronous behavior

Convert Speech to Text in Python in 5 Minutes

How to Get YouTube Video Transcripts

Transcribe Twilio Phone Calls in Real-Time with AssemblyAI

Build an AI Voice Agent with DeepSeek R1, AssemblyAI, and ElevenLabs

Announcing Our $28M Series A Led by Accel

Expanding Access: Slam-1 and LeMUR Now Available in the EU

Raising the bar for Speech AI: Announcing a first of its kind Speech Language Model and improved Streaming model

Improved Streaming Speech-to-Text Pricing and Features

Auto-Tweet Your Words Using Speech Recognition in Python

#Setting up the dependencies

#Listening with the microphone

#Connecting to Twitter

#Sending audio to AssemblyAI

#Interpreting the transcription

#Tweeting transcribed sentences

#Establishing asynchronous behavior

Related posts

Convert Speech to Text in Python in 5 Minutes

How to Get YouTube Video Transcripts

Transcribe Twilio Phone Calls in Real-Time with AssemblyAI

Build an AI Voice Agent with DeepSeek R1, AssemblyAI, and ElevenLabs

Announcing Our $28M Series A Led by Accel

Expanding Access: Slam-1 and LeMUR Now Available in the EU

Raising the bar for Speech AI: Announcing a first of its kind Speech Language Model and improved Streaming model

Improved Streaming Speech-to-Text Pricing and Features