Identifying hate speech in audio or video files — AssemblyAI

Our Content Moderation model can help you ensure that your content is safe and appropriate for all audiences.

The model pinpoints sensitive discussions in spoken data and provides information on the severity to which they occurred.

In this guide, we’ll learn how to use the Content Moderation model, and look at an example response to understand its structure.

Get started

Before we begin, make sure you have an AssemblyAI account and an API key. You can sign up for a free account and get your API key from your dashboard.

The complete source code for this guide can be viewed here.

Here is an audio example for this guide:

$ https://assembly.ai/wildfires.mp3

Step-by-step instructions

Python SDK

Install the SDK.

Python SDK

Python (requests)

TypeScript

PHP

Ruby

C#

1 pip install -U assemblyai

Set up the API endpoint and headers. The headers should include your API key.

Python SDK

Python (requests)

TypeScript

PHP

Ruby

C#

1 import assemblyai as aai
2 
3 aai.settings.api_key = "<YOUR_API_KEY>"

Python SDK

Create a TranscriptionConfig with content_safety set to True.

Python SDK

Python (requests)

TypeScript

PHP

Ruby

C#

1 # highlight-next-line
2 config = aai.TranscriptionConfig(content_safety=True)

Python SDK

Create a Transcriber object and pass in the configuration.

Python SDK

Python (requests)

TypeScript

PHP

Ruby

C#

1 transcriber = aai.Transcriber(config=config)

Python SDK

Use the Transcriber object’s transcribe method and pass in the audio file’s path as a parameter. The transcribe method saves the results of the transcription to the Transcriber object’s transcript attribute.

Python SDK

Python (requests)

TypeScript

PHP

Ruby

C#

1 FILE_URL = "https://assembly.ai/wildfires.mp3"
2 
3 transcript = transcriber.transcribe(FILE_URL)

Python SDK

You can access the content moderation results through the Transcriber object’s content_safety attribute.

Python SDK

Python (requests)

TypeScript

PHP

Ruby

C#

1 # Get the parts of the transcript which were flagged as sensitive
2 for result in transcript.content_safety.results:
3   print(result.text)  # sensitive text snippet
4   print(result.timestamp.start)
5   print(result.timestamp.end)
6 
7   for label in result.labels:
8     print(label.label)  # content safety category
9     print(label.confidence) # model's confidence that the text is in this category
10     print(label.severity) # severity of the text in relation to the category
11 
12 # Get the confidence of the most common labels in relation to the entire audio file
13 for label, confidence in transcript.content_safety.summary.items():
14   print(f"{confidence * 100}% confident that the audio contains {label}")
15 
16 # Get the overall severity of the most common labels in relation to the entire audio file
17 for label, severity_confidence in transcript.content_safety.severity_score_summary.items():
18   print(f"{severity_confidence.low * 100}% confident that the audio contains low-severity {label}")
19   print(f"{severity_confidence.medium * 100}% confident that the audio contains mid-severity {label}")
20   print(f"{severity_confidence.high * 100}% confident that the audio contains high-severity {label}")

Understanding the response

In the JSON response, there’ll be an additional key called content_safety_labels that contains information about any sensitive content detected. The full text is contained in the text key, and each problematic utterance has its own labels and timestamp. The entire audio is assigned a summary and a severity_score_summary for each category of unsafe content. Each label is returned with a confidence score and a severity score.

For more information, see Content Moderation model documentation and API reference.

Conclusion

The AssemblyAI API supports many different content safety labels. Identifying hate speech is only a single, important use case for automated content moderation, and you can learn about others on the AssemblyAI blog.

1	import assemblyai as aai
2
3	aai.settings.api_key = "<YOUR_API_KEY>"

1	# highlight-next-line
2	config = aai.TranscriptionConfig(content_safety=True)

1	FILE_URL = "https://assembly.ai/wildfires.mp3"
2
3	transcript = transcriber.transcribe(FILE_URL)

1	# Get the parts of the transcript which were flagged as sensitive
2	for result in transcript.content_safety.results:
3	print(result.text) # sensitive text snippet
4	print(result.timestamp.start)
5	print(result.timestamp.end)
6
7	for label in result.labels:
8	print(label.label) # content safety category
9	print(label.confidence) # model's confidence that the text is in this category
10	print(label.severity) # severity of the text in relation to the category
11
12	# Get the confidence of the most common labels in relation to the entire audio file
13	for label, confidence in transcript.content_safety.summary.items():
14	print(f"{confidence * 100}% confident that the audio contains {label}")
15
16	# Get the overall severity of the most common labels in relation to the entire audio file
17	for label, severity_confidence in transcript.content_safety.severity_score_summary.items():
18	print(f"{severity_confidence.low * 100}% confident that the audio contains low-severity {label}")
19	print(f"{severity_confidence.medium * 100}% confident that the audio contains mid-severity {label}")
20	print(f"{severity_confidence.high * 100}% confident that the audio contains high-severity {label}")