May 22, 2024

Filter profanity from audio files using Python

Learn how to filter profanity out of audio and video files with fewer than 10 lines of code in this tutorial

Tutorial

Profanity Filtering

Python

Ryan O'Connor

Senior Developer Educator

Ryan O'Connor

Senior Developer Educator

Reviewed by

No items found.

Table of contents

[Visible on live site]

With a greater amount of online interaction happening every day, it’s become increasingly difficult to ensure that these interactions are safe and constructive. Profanity filtering is a common technique used for this purpose across various applications, from social media to customer support. Profanity detection artificial intelligence models now enable developers to automatically and efficiently filter out offensive language at scale, facilitating the development of safe and welcoming digital environments.

In this tutorial, we’ll learn how to use Python to filter profanity from audio files. By the end of this guide, you'll be equipped to implement this functionality in just a few lines of code, enhancing both user experience and content compliance.

Here is the audio file we will be running profanity filtering on, along with the filtered output, where the asterisks represent harmful speech that has automatically been filtered:

Profanity filtering

Filtering profanity from audio and video files is easy as s*** with AssemblyAI.

#Step 1: Set up your environment

First, make sure Python is installed on your system if it is not already. Then, install the assemblyai package, which allows developers to easily use AssemblyAI’s API.

pip install assemblyai

Next, get a free AssemblyAI API key here; or, if you already have one, you can copy it from your Dashboard. Once you’ve copied your API key, set it as an environment variable on your machine, which allows your requests to be automatically authorized when you use the assemblyai package:

# Mac/Linux: export ASSEMBLYAI_API_KEY=<YOUR_KEY> # Windows: set ASSEMBLYAI_API_KEY=<YOUR_KEY>

#Step 2: Transcribe and filter the audio file

Now that our environment is set up, we can submit an audio file for transcription with profanity filtering. For this tutorial, we’ll be using this example file. If you want to use your own file, you can use either a local file on your system or a remote file as long as it is a publicly accessible download URL (when you click the link, it should start downloading in your browser). You can either an audio or a video file.

Create a file called main.py, and then import the assemblyai package and specify the path to the audio file you want to filter profanity from:

import assemblyai asaai# replace with local filepath or your remote fileaudio_url = "https://storage.googleapis.com/aai-web-samples/profanity-filtering.mp3"

Next, we create an aai.TranscriptionConfig object, in which we specify the settings for our transcription. In this case, we enable profanity filtering via filter_profanity=True. Then we create an aai.Transcriber object, which actually performs transcription. Passing this config into the aai.Transcriber causes it to apply profanity filtering to any file it transcribes.

config = aai.TranscriptionConfig(filter_profanity=True) transcriber = aai.Transcriber(config=config)

Finally, we use the transcribe method of the Transcriber object to transcribe the audio file with profanity filtering:

transcript = transcriber.transcribe(audio_url)

#Step 3: Print the filtered text

We can print the profanity-filtered text as follows:

if nottranscript.error: print(transcript.text)else: raise RuntimeError(f"There was an error transcribing the file: {transcript.error}")

Save your file and execute it by running python main.py in the project directory. You'll see the profanity-filtered audio transcript printed to the terminal - if you used the default file from above you’ll see the following output printed to the terminal:

Filtering profanity from audio and video files is easy as s*** with AssemblyAI.

The transcript contains a litany of information about the transcribed audio file, like word-level timestamps and more, which you can access through the object’s attributes. Check out our docs to learn more about Transcript objects and the other information you can get back from our API.

Alternatively, feel free to check out our blog for more learning resources and tutorials, like this video on how to build a talking AI with LLaMa 3:

‍

Filter profanity from audio files using Python

#Step 1: Set up your environment

#Step 2: Transcribe and filter the audio file

#Step 3: Print the filtered text

Using multichannel and speaker diarization

How to use Google's Speech-to-Text API to transcribe audio in Python

Python Speech-to-Text with Punctuation, Casing, and Formatting

Transcribe a phone call in real-time using Python with AssemblyAI and Twilio

Supervised Machine Learning For Beginners

3 ways to build and deploy AI tools and features faster

Announcing New Language Support for PII Text Redaction and Expanding Entity Detection

Improved Real-Time Transcription Speed and Accuracy

Filter profanity from audio files using Python

#Step 1: Set up your environment

#Step 2: Transcribe and filter the audio file

#Step 3: Print the filtered text

Related posts

Using multichannel and speaker diarization

How to use Google's Speech-to-Text API to transcribe audio in Python

Python Speech-to-Text with Punctuation, Casing, and Formatting

Transcribe a phone call in real-time using Python with AssemblyAI and Twilio

Supervised Machine Learning For Beginners

3 ways to build and deploy AI tools and features faster

Announcing New Language Support for PII Text Redaction and Expanding Entity Detection

Improved Real-Time Transcription Speed and Accuracy