PII Redaction
The PII Redaction model lets you minimize sensitive information about individuals by automatically identifying and removing it from your transcript.
Personal Identifiable Information (PII) is any information that can be used to identify a person, such as a name, email address, or phone number.
When you enable the PII Redaction model, your transcript will look like this:
- With
hash
substitution:Hi, my name is ####!
- With
entity_name
substitution:Hi, my name is [PERSON_NAME]!
You can also Create redacted audio files to replace sensitive information with a beeping sound.
PII Redaction is available in multiple languages. See Supported languages.
PII only redacts words in the text
property. Properties from other features may still include PII, such as entities
from Entity Detection or summary
from Summarization.
Quickstart
Enable PII Redaction by setting redact_pii
to true
in the transcription
config.
Use redact_pii_policies
to specify the information you want to
redact. For the full list of policies, see PII policies.
Example output
Smoke from hundreds of wildfires in Canada is triggering air quality alerts
throughout the US. Skylines from Maine to Maryland to Minnesota are gray and
smoggy. And in some places, the air quality warnings include the warning to stay
inside. We wanted to better understand what's happening here and why, so we
called ##### #######, an ######### ######### in the ########## ## #############
###### ### ########### at ##### ####### ##########. Good morning, #########.
Good morning. So what is it about the conditions right now that have caused this
round of wildfires to affect so many people so far away? Well, there's a couple
of things. The season has been pretty dry already, and then the fact that we're
getting hit in the US. Is because there's a couple of weather systems that ...
Create redacted audio files
In addition to redacting sensitive information from the transcription text, you can also generate a version of the original audio file with the PII "beeped" out.
To create a redacted version of the audio file, use the set_redact_pii()
method on the TranscriptionConfig
with redact_audio
to True
.
Use get_redacted_audio_url()
on the transcript to get the URL to the redacted audio file.
You can only create redacted audio files for transcriptions in English and Spanish.
You can only create redacted versions of audio files if the original file is smaller than 1 GB.
Example output
https://s3.us-west-2.amazonaws.com/api.assembly.ai.usw2/redacted-audio/ac06721c-d1ea-41a7-95f7-a9463421e6b1.mp3?AWSAccessKeyId=...
API reference
Request
curl https://api.assemblyai.com/v2/transcript \
--header "Authorization: YOUR_API_KEY" \
--header "Content-Type: application/json" \
--data '{
"audio_url": "YOUR_AUDIO_URL",
"redact_pii": true,
"redact_pii_policies": ["us_social_security_number", "credit_card_number"],
"redact_pii_sub": "hash",
"redact_pii_audio": true,
"redact_pii_audio_quality": "mp3"
}'
Key | Type | Description |
---|---|---|
redact_pii | boolean | Enable PII Redaction. |
redact_pii_policies | array | PII policies for what information to redact. |
redact_pii_sub | string | Method used to substitute PII in the transcript. Can be entity_name or hash . |
redact_pii_audio | boolean | Create a redacted version of the audio file. |
redact_pii_audio_quality | string | Quality of the redacted PII audio file. Can be mp3 or wav . |
Response
Key | Type | Description |
---|---|---|
text | string | Transcript with redacted PII. |
The response also includes the request parameters used to generate the transcript.
PII policies
account_number | Customer account or membership identification number | Policy No. 10042992; Member ID: HZ-5235-001 |
banking_information | Banking information, including account and routing numbers | |
blood_type | Blood type | O-, AB positive |
credit_card_cvv | Credit card verification code | CVV: 080 |
credit_card_expiration | Expiration date of a credit card | |
credit_card_number | Credit card number | |
date | Specific calendar date | December 18 |
date_of_birth | Date of birth | Date of Birth: March 7,1961 |
drivers_license | Driver's license number. | DL# 356933-540 |
drug | Medications, vitamins, or supplements | Advil, Acetaminophen, Panadol |
email_address | Email address | support@assemblyai.com |
event | Name of an event or holiday | Olympics, Yom Kippur |
gender_sexuality | Terms indicating gender identity or sexual orientation, including slang terms | female; bisexual; trans |
healthcare_number | Healthcare numbers and health plan beneficiary numbers | Policy No.: 5584-486-674-YM |
injury | Bodily injury | I broke my arm, I have a sprained wrist |
ip_address | Internet IP address, including IPv4 and IPv6 formats | 192.168.0.1 |
language | Name of a natural language | Spanish, French |
location | Any Location reference including mailing address, postal code, city, state, province, country, or coordinates. | Lake Victoria, 145 Windsor St., 90210 |
medical_condition | Name of a medical condition, disease, syndrome, deficit, or disorder | chronic fatigue syndrome, arrhythmia, depression |
medical_process | Medical process, including treatments, procedures, and tests | heart surgery, CT scan |
money_amount | Name and/or amount of currency | 15 pesos, $94.50 |
nationality | Terms indicating nationality, ethnicity, or race | American, Asian, Caucasian |
number_sequence | Numerical PII (including alphanumeric strings) that doesn't fall under other categories | |
occupation | Job title or profession | professor, actors, engineer, CPA |
organization | Name of an organization | CNN, McDonalds, University of Alaska, Northwest General Hospital |
passport_number | Passport numbers, issued by any country | PA4568332; NU3C6L86S12 |
password | Account passwords, PINs, access keys, or verification answers | 27%alfalfa, temp1234, My mother's maiden name is Smith |
person_age | Number associated with an age | 27, 75 |
person_name | Name of a person | Bob, Doug Jones, Dr. Kay Martinez, MD |
phone_number | Telephone or fax number | |
political_affiliation | Terms referring to a political party, movement, or ideology | Republican, Liberal |
religion | Terms indicating religious affiliation | Hindu, Catholic |
url | Internet addresses | https://www.assemblyai.com/ |
us_social_security_number | Social Security Number or equivalent | |
username | Usernames, login names, or handles | @AssemblyAI |
vehicle_id | Vehicle identification numbers (VINs), vehicle serial numbers, and license plate numbers | 5FNRL38918B111818; BIF7547 |
Troubleshooting
redact_pii_policies
parameter. If you're still experiencing issues, please reach out to our support team for assistance.webhook_url
parameter is included with a valid URL that can be reached by AssemblyAI's API. If you're using custom authentication headers, ensure that the webhook_auth_header_name
and webhook_auth_header_value
parameters are included and are correct. If you're still having issues, please contact our support team for assistance.redact_pii_audio_quality
to wav
.