Redact PII Entities in a Transcript with Entity Detection | AssemblyAI

This guide will walk you through using AssemblyAI’s Entity Detection model to redact specific entities from an audio transcription.

While AssemblyAI offers a PII Redaction model for automatic redaction, this method is ideal for scenarios where you need both a redacted and a non-redacted version of the transcript.

We’ll use the AssemblyAI Python SDK to demonstrate this. By the end of this guide, you’ll be able to effectively redact sensitive information from your transcriptions while preserving the original text.

Quickstart

1 import assemblyai as aai
2 aai.settings.api_key = "YOUR_API_KEY"
3 
4 transcriber = aai.Transcriber()
5 
6 audio_url = (
7     "https://github.com/AssemblyAI-Examples/audio-examples/raw/main/20230607_me_canadian_wildfires.mp3"
8 )
9 
10 config = aai.TranscriptionConfig(entity_detection=True)
11 
12 transcript = transcriber.transcribe(audio_url, config)
13 redacted_transcript = transcript.text
14 
15 # redact ALL entities
16 for entity in transcript.entities:
17     redacted_transcript = redacted_transcript.replace(entity.text, f"[{entity.entity_type.upper()}]")
18 
19 print(redacted_transcript[:500])

Before you begin

To complete this tutorial, you need:

Python installed.
An AssemblyAI account.

Step-by-Step Guide

Install the AssemblyAI SDK:

$ pip install assemblyai

Import the assemblyai package and set your API key:

1 import assemblyai as aai
2 
3 aai.settings.api_key = "YOUR-API-KEY"

Define a Transcriber and a TranscriptionConfig with entity_detection set to True, and then create a transcript.

1 transcriber = aai.Transcriber()
2 
3 audio_url = (
4     "https://github.com/AssemblyAI-Examples/audio-examples/raw/main/20230607_me_canadian_wildfires.mp3"
5 )
6 
7 config = aai.TranscriptionConfig(entity_detection=True)
8 
9 transcript = transcriber.transcribe(audio_url, config)

To redact all detected entities, iterate through the entities in the transcript and replace their text with their entity type:

1 redacted_transcript = transcript.text
2 
3 # redact ALL entities
4 for entity in transcript.entities:
5     redacted_transcript = redacted_transcript.replace(entity.text, f"[{entity.entity_type.upper()}]")
6 
7 print(redacted_transcript[:500])

Output

Smoke from hundreds of wildfires in [LOCATION] is triggering air quality alerts throughout [LOCATION]. Skylines from [LOCATION] to [LOCATION] to [LOCATION] are gray and smoggy. And in some places, the air quality warnings include the warning to stay inside. We wanted to better understand what's happening here and why. So he called [PERSON_NAME], an [OCCUPATION] in the [ORGANIZATION] at [ORGANIZATION]...

If you want to redact only certain types of entities (e.g., locations), filter them using a list of entity types:

1 # filter for some entities
2 redacted_transcript2 = transcript.text
3 
4 pii_policies = ["location"]
5 
6 for entity in transcript.entities:
7     if entity.entity_type in pii_policies:
8         redacted_transcript2 = redacted_transcript2.replace(entity.text, f"[{entity.entity_type.upper()}]")
9 
10 print(redacted_transcript2[:500])

Output

Smoke from hundreds of wildfires in [LOCATION] is triggering air quality alerts throughout [LOCATION]. Skylines from [LOCATION] to [LOCATION] to [LOCATION] are gray and smoggy. And in some places, the air quality warnings include the warning to stay inside. We wanted to better understand what's happening here and why. So he called Peter DiCarlo, an associate professor in the department of Environmental Health and Engineering at Johns Hopkins University. Good morning. Professor. Good morning. So

Conclusion

Disclaimer: This method only creates a local redacted copy of the text. If you make a GET request for the transcript again, the text field will remain unredacted.

This tutorial demonstrated how to use the AssemblyAI Python SDK to redact sensitive information from your transcriptions using our Entity Detection model. If you have any further questions or need additional assistance, feel free to reach out to the AssemblyAI Support team at support@assemblyai.com!