Last month, we introduced PII Redaction Policies, as part of a big overhaul to our PII Redaction feature to make it more flexible and powerful for you to specify exactly what you want redacted from your transcriptions.
Enhanced PII Redaction: More Policies and Customization
We’ve now expanded the list of Redaction Policies available to 20, with more on the way before the end of the year. Some of these new policies include credit_card_cvv
, credit_card_expiration
, organization
, nationality
, event
, and location
- the full list is shown below:
Using these new policies, you can take even greater control on how you safely redact the transcriptions produced by our API in order to comply with your, and your customers’, security standards.
Customize How Redacted PII is Replaced
By default, any PII that is detected is replaced with a hash - #
. For example, the credit card number 1111-2222-3333-4444
is replaced with ####-####-####-####
. To make the redaction more user-friendly and readable, the redacted text can now be replaced with the policy name. For example, the credit card number 1111-2222-3333-4444
is replaced with [CREDIT_CARD_NUMBER]
, and the social security number 111-11-1111
would be replaced with [US_SOCIAL_SECURITY_NUMBER]
.
When you have a lot of redaction policies enabled, this new feature maintains the readability of your transcriptions for your end-users compared to replacing all sensitive information with hash characters. To enable this new feature, you just have to include a new parameter in your POST request, redact_pii_sub
.
How PII Redaction Works in AssemblyAI
Testing these policies using AssemblyAI’s API only takes a couple of minutes to setup. Using the code sample below, you can enable PII Redaction on your own audio or video files (for more code samples, check out our API Docs).
import requests
endpoint = "https://api.assemblyai.com/v2/transcript"
json = {
"audio_url": "https://app.assemblyai.com/static/media/phone_demo_clip_1.wav",
"redact_pii": True,
"redact_pii_policies": ["all"]
}
headers = {
"authorization": "YOUR-API-TOKEN",
"content-type": "application/json"
}
response = requests.post(endpoint, json=json, headers=headers)
print(response.json())
PII Redaction for Audio Files
These same PII Redaction Policies also apply to audio redaction. We will mute the parts of your audio where PII is spoken, and make a downloadable URL available for the redacted audio file. To test audio redaction on your files, follow our guide here.
Improved Accuracy
We released another set of accuracy updates to our neural network - these include significant improvements to call, video, and podcast content. To help benchmark our current model’s accuracy, we’ve included a comparison versus common providers like Google Cloud’s Speech-to-Text (Premium Video Model) and AWS Transcribe below.
These sample video podcast transcripts (from Joe Rogan’s Podcast) are shown alongside the Word Error Rate which calculates the accuracy of automated speech recognition vs human transcription. As you can see in the table, we still consistently out-perform big tech providers like Google and AWS. Other providers like Microsoft and IBM were included in our analysis, but recorded the lowest accuracy.
AssemblyAI | WhatConverts Case Study
WhatConverts is a call tracking software (SaaS) that helps customers answer the question, “What marketing works?”
Their platform integrates with all marketing channels (e.g. Google Ads, Facebook Ads, Intercom, etc.) and makes it simple to understand which campaigns are working and which campaigns aren’t delivering leads. Leads are then prioritized, ranked, and managed within the WhatConverts software. Read the full case study here.
With the switch to AssemblyAI, they experienced a significant accuracy improvement, improved security - PII (PCI) Redaction, and more affordable pricing. WhatConverts also covered their switch to AssemblyAI, you can read their update on the WhatConverts Blog.