Hey 👋, this weekly update contains the latest info on our new product features, tutorials, and our community.
🔥PII Redaction: Now Available Across 47 Languages
Our latest update expands PII Text Redaction support to 47 additional languages, ensuring comprehensive protection of personally identifiable information (PII) across diverse regions. This allows you to:
- Identify and remove personal data such as addresses, phone numbers, and credit card details from your transcripts.
- Generate transcripts with PII removed, or "beep out" sensitive information in audio files.
Check out our docs for more detailed examples and for an in-depth dive into our updates, read our blog.
Here's an example of how to use our API for PII redaction:
import assemblyai as aai
aai.settings.api_key = "YOUR API KEY"
audio_url = "https://github.com/AssemblyAI-Community/audio-examples/raw/main/20230607_me_canadian_wildfires.mp3"
config = aai.TranscriptionConfig(speaker_labels=True).set_redact_pii(
policies=[
aai.PIIRedactionPolicy.person_name,
aai.PIIRedactionPolicy.organization,
aai.PIIRedactionPolicy.occupation,
],
substitution=aai.PIISubstitutionPolicy.hash,
)
transcript = aai.Transcriber().transcribe(audio_url, config)
for utterance in transcript.utterances:
print(f"Speaker {utterance.speaker}: {utterance.text}")
print(transcript.text)
Entity Detection Upgraded
We've added 16 new entity types to our Entity Detection model, bringing the total to 44 types. This allows you to automatically identify and categorize critical information in your transcripts, such as names, organizations, addresses, and more with a 99% accuracy in major languages. Here's an example of how to use our API for Entity Detection:
import assemblyai as aai
aai.settings.api_key = "YOUR API KEY"
audio_url = "https://github.com/AssemblyAI-Community/audio-examples/raw/main/20230607_me_canadian_wildfires.mp3"
config = aai.TranscriptionConfig(entity_detection=True)
transcript = aai.Transcriber().transcribe(audio_url, config)
for entity in transcript.entities:
print(entity.text)
print(entity.entity_type)
print(f"Timestamp: {entity.start} - {entity.end}\n")
Fresh From Our Blog
Get started using Claude 3.5 Sonnet with audio data: Learn how to use the Claude 3 models with audio and video data in Python. Read more>>
Florence-2: How it works and how to use it: Microsoft's Florence-2 is a foundational image model that can perform almost every common task in computer vision. Learn how Florence-2 works and how to use it in this guide. Read more>>
How to Create a Real-Time Language Translation Service with AssemblyAI and DeepL in JavaScript: Translate speech in real-time in JavaScript with AssemblyAI and DeepL. Read more>>