PII Redaction
The PII Redaction model lets you minimize sensitive information about individuals by automatically identifying and removing it from your transcript.
Personal Identifiable Information (PII) is any information that can be used to identify a person, such as a name, email address, or phone number.
When you enable the PII Redaction model, your transcript will look like this:
- With
hash
substitution:Hi, my name is ####!
- With
entity_name
substitution:Hi, my name is [PERSON_NAME]!
You can also Create redacted audio files to replace sensitive information with a beeping sound.
Supported languages
PII Redaction is available in multiple languages. See Supported languages.
Redacted properties
PII only redacts words in the text
property. Properties from other features may still include PII, such as entities
from Entity Detection or summary
from Summarization.
Quickstart
Python
TypeScript
Go
Java
C#
Ruby
Enable PII Redaction on the TranscriptionConfig
using the set_redact_pii()
method.
Set policies
to specify the information you want to redact. For the full list of policies, see PII policies.
Example output
Create redacted audio files
In addition to redacting sensitive information from the transcription text, you can also generate a version of the original audio file with the PII “beeped” out.
Python
TypeScript
Go
Java
C#
Ruby
To create a redacted version of the audio file, use the set_redact_pii()
method on the TranscriptionConfig
with redact_audio
to True
.
Use get_redacted_audio_url()
on the transcript to get the URL to the redacted audio file.
Supported languages
You can only create redacted audio files for transcriptions in English and Spanish.
Maximum audio file size
You can only create redacted versions of audio files if the original file is smaller than 1 GB.
Example output
API reference
Request
Response
The response also includes the request parameters used to generate the transcript.
PII policies
Troubleshooting
Why is the PII not redacted in my transcription?
Make sure that at least one PII policy has been
specified in your request, using the redact_pii_policies
parameter. If you’re still experiencing issues, please reach out to our
support team for assistance.
Why is my webhook not being sent?
There could be several reasons why your webhook isn’t being sent, such as a
misconfigured URL, an unreachable endpoint, or an issue with the
authentication headers. Double-check your request and ensure that the
webhook_url
parameter is included with a valid URL that can be
reached by AssemblyAI’s API. If you’re using custom authentication headers,
ensure that the webhook_auth_header_name
and
webhook_auth_header_value
parameters are included and are
correct. If you’re still having issues, please contact our support team for
assistance.
Why does my redacted audio file sound worse than the original?
By default, the API returns redacted audio files in MP3 format, a lossy
format. Lossy formats remove audio information to reduce file size, which may
cause a reduction in quality. The difference may be particularly noticeable if
the submitted audio is in a lossless file format. To retain as much quality as
possible, you can instead return your redacted audio files in a lossless
format, by setting redact_pii_audio_quality
to wav
.