Redact PII Entities in a Transcript with Entity Detection
This guide will walk you through using AssemblyAI’s Entity Detection model to redact specific entities from an audio transcription.
While AssemblyAI offers a PII Redaction model for automatic redaction, this method is ideal for scenarios where you need both a redacted and a non-redacted version of the transcript.
We’ll use the AssemblyAI Python SDK to demonstrate this. By the end of this guide, you’ll be able to effectively redact sensitive information from your transcriptions while preserving the original text.
Quickstart
Before you begin
To complete this tutorial, you need:
- Python installed.
- An AssemblyAI account.
Step-by-Step Guide
Install the AssemblyAI SDK:
Import the assemblyai
package and set your API key:
Define a Transcriber and a TranscriptionConfig with entity_detection set to True, and then create a transcript.
To redact all detected entities, iterate through the entities in the transcript and replace their text with their entity type:
If you want to redact only certain types of entities (e.g., locations), filter them using a list of entity types:
Conclusion
Disclaimer: This method only creates a local redacted copy of the text. If you make a GET request for the transcript again, the text field will remain unredacted.
This tutorial demonstrated how to use the AssemblyAI Python SDK to redact sensitive information from your transcriptions using our Entity Detection model. If you have any further questions or need additional assistance, feel free to reach out to the AssemblyAI Support team at support@assemblyai.com!