Audio Intelligence

Auto Chapters

The Auto Chapters model summarizes audio data over time into chapters. Chapters makes it easy for users to navigate and find specific information.

Each chapter contains the following:

Summary
One-line gist
Headline
Start and end timestamps

Auto Chapters and Summarization

You can only enable one of the Auto Chapters and Summarization models in the same transcription.

Quickstart

Enable Auto Chapters by setting auto_chapters to true in the transcription config. punctuate must be enabled to use Auto Chapters (punctuate is enabled by default).

1 import assemblyai as aai
2 
3 aai.settings.api_key = "<YOUR_API_KEY>"
4 
5 # audio_file = "./local_file.mp3"
6 audio_file = "https://assembly.ai/wildfires.mp3"
7 
8 config = aai.TranscriptionConfig(auto_chapters=True)
9 
10 transcript = aai.Transcriber().transcribe(audio_file, config)
11 
12 for chapter in transcript.chapters:
13   print(f"{chapter.start}-{chapter.end}: {chapter.headline}")

Example output

1 250-28840: Smoke from hundreds of wildfires in Canada is triggering air quality alerts across US
2 29610-280340: High particulate matter in wildfire smoke can lead to serious health problems

Auto Chapters Using LeMUR

Check out this cookbook Creating Chapter Summaries for an example of how to leverage LeMUR’s custom text input parameter for chapter summaries.

For the full API reference, see the API reference section on the Auto Chapters page.

Content Moderation

The Content Moderation model lets you detect inappropriate content in audio files to ensure that your content is safe for all audiences.

The model pinpoints sensitive discussions in spoken data and their severity.

Quickstart

Enable Content Moderation by setting content_safety to true in the transcription config.

1 import assemblyai as aai
2 
3 aai.settings.api_key = "<YOUR_API_KEY>"
4 
5 # audio_file = "./local_file.mp3"
6 audio_file = "https://assembly.ai/wildfires.mp3"
7 
8 config = aai.TranscriptionConfig(content_safety=True)
9 
10 transcript = aai.Transcriber().transcribe(audio_file, config)
11 
12 # Get the parts of the transcript which were flagged as sensitive.
13 for result in transcript.content_safety.results:
14     print(result.text)
15     print(f"Timestamp: {result.timestamp.start} - {result.timestamp.end}")
16 
17     # Get category, confidence, and severity.
18     for label in result.labels:
19       print(f"{label.label} - {label.confidence} - {label.severity}")  # content safety category
20     print()
21 
22 # Get the confidence of the most common labels in relation to the entire audio file.
23 for label, confidence in transcript.content_safety.summary.items():
24     print(f"{confidence * 100}% confident that the audio contains {label}")
25 
26 print()
27 
28 # Get the overall severity of the most common labels in relation to the entire audio file.
29 for label, severity_confidence in transcript.content_safety.severity_score_summary.items():
30     print(f"{severity_confidence.low * 100}% confident that the audio contains low-severity {label}")
31     print(f"{severity_confidence.medium * 100}% confident that the audio contains medium-severity {label}")
32     print(f"{severity_confidence.high * 100}% confident that the audio contains high-severity {label}")

Example output

1 Smoke from hundreds of wildfires in Canada is triggering air quality alerts throughout the US. Skylines...
2 Timestamp: 250 - 28920
3 disasters - 0.8141 - 0.4014
4 
5 So what is it about the conditions right now that have caused this round of wildfires to...
6 Timestamp: 29290 - 56190
7 disasters - 0.9217 - 0.5665
8 
9 So what is it in this haze that makes it harmful? And I'm assuming it is...
10 Timestamp: 56340 - 88034
11 health_issues - 0.9358 - 0.8906
12 
13 ...
14 
15 99.42% confident that the audio contains disasters
16 92.70% confident that the audio contains health_issues
17 
18 57.43% confident that the audio contains low-severity disasters
19 42.56% confident that the audio contains mid-severity disasters
20 0.0% confident that the audio contains high-severity disasters
21 23.57% confident that the audio contains low-severity health_issues
22 30.22% confident that the audio contains mid-severity health_issues
23 46.19% confident that the audio contains high-severity health_issues

Adjust the confidence threshold

The confidence threshold determines how likely something is to be flagged as inappropriate content. A threshold of 50% (which is the default) means any label with a confidence score of 50% or greater is flagged.

To adjust the confidence threshold for your transcription, include content_safety_confidence in the transcription config.

1 # Setting the content safety confidence threshold to 60%.
2 config = aai.TranscriptionConfig(
3   content_safety=True,
4   content_safety_confidence=60
5 )

For the full API reference, as well as the supported labels and FAQs, refer to the full Content Moderation page.

Entity Detection

The Entity Detection model lets you automatically identify and categorize key information in transcribed audio content.

Here are a few examples of what you can detect:

Names of people
Organizations
Addresses
Phone numbers
Medical data
Social security numbers

For the full list of entities that you can detect, see Supported entities.

Supported languages

Entity Detection is available in multiple languages. See Supported languages.

Quickstart

Enable Entity Detection by setting entity_detection to true in the transcription config.

1 import assemblyai as aai
2 
3 aai.settings.api_key = "<YOUR_API_KEY>"
4 
5 # audio_file = "./local_file.mp3"
6 audio_file = "https://assembly.ai/wildfires.mp3"
7 
8 config = aai.TranscriptionConfig(entity_detection=True)
9 
10 transcript = aai.Transcriber().transcribe(audio_file, config)
11 
12 for entity in transcript.entities:
13     print(entity.text)
14     print(entity.entity_type)
15     print(f"Timestamp: {entity.start} - {entity.end}\n")

Example output

1 Canada
2 location
3 Timestamp: 2548 - 3130
4 
5 the US
6 location
7 Timestamp: 5498 - 6350
8 
9 ...

For the full API reference, as well as the supported entities and FAQs, refer to the full Entity Detection page.

Key Phrases

The Key Phrases model identifies significant words and phrases in your transcript and lets you extract the most important concepts or highlights from your audio or video file.

Quickstart

Enable Key Phrases by setting auto_highlights to true in the transcription config.

1 import assemblyai as aai
2 
3 aai.settings.api_key = "<YOUR_API_KEY>"
4 
5 # audio_file = "./local_file.mp3"
6 audio_file = "https://assembly.ai/wildfires.mp3"
7 
8 config = aai.TranscriptionConfig(auto_highlights=True)
9 
10 transcript = aai.Transcriber().transcribe(audio_file, config)
11 
12 for result in transcript.auto_highlights.results:
13     print(f"Highlight: {result.text}, Count: {result.count}, Rank: {result.rank}, Timestamps: {result.timestamps}")

Example output

1 Highlight: air quality alerts, Count: 1, Rank: 0.08, Timestamps: [Timestamp(start=3978, end=5114)]
2 Highlight: wide ranging air quality consequences, Count: 1, Rank: 0.08, Timestamps: [Timestamp(start=235388, end=238838)]
3 Highlight: more fires, Count: 1, Rank: 0.07, Timestamps: [Timestamp(start=184716, end=185186)]
4 ...

For the full API reference and FAQs, refer to the full Key Phrases page.

PII Redaction

The PII Redaction model lets you minimize sensitive information about individuals by automatically identifying and removing it from your transcript.

Personal Identifiable Information (PII) is any information that can be used to identify a person, such as a name, email address, or phone number.

When you enable the PII Redaction model, your transcript will look like this:

With hash substitution: Hi, my name is ####!
With entity_name substitution: Hi, my name is [PERSON_NAME]!

You can also Create redacted audio files to replace sensitive information with a beeping sound.

Supported languages

PII Redaction is available in multiple languages. See Supported languages.

Redacted properties

PII only redacts words in the text property. Properties from other features may still include PII, such as entities from Entity Detection or summary from Summarization.

Quickstart

Enable PII Redaction on the TranscriptionConfig using the set_redact_pii() method.

Set policies to specify the information you want to redact. For the full list of policies, see PII policies.

1 import assemblyai as aai
2 
3 aai.settings.api_key = "<YOUR_API_KEY>"
4 
5 # audio_file = "./local_file.mp3"
6 audio_file = "https://assembly.ai/wildfires.mp3"
7 
8 config = aai.TranscriptionConfig().set_redact_pii(
9     policies=[
10         aai.PIIRedactionPolicy.person_name,
11         aai.PIIRedactionPolicy.organization,
12         aai.PIIRedactionPolicy.occupation,
13     ],
14     substitution=aai.PIISubstitutionPolicy.hash,
15 )
16 
17 transcript = aai.Transcriber().transcribe(audio_file, config)
18 
19 print(transcript.text)

Example output

1 Smoke from hundreds of wildfires in Canada is triggering air quality alerts
2 throughout the US. Skylines from Maine to Maryland to Minnesota are gray and
3 smoggy. And in some places, the air quality warnings include the warning to stay
4 inside. We wanted to better understand what's happening here and why, so we
5 called ##### #######, an ######### ######### in the ########## ## #############
6 ###### ### ########### at ##### ####### ##########. Good morning, #########.
7 Good morning. So what is it about the conditions right now that have caused this
8 round of wildfires to affect so many people so far away? Well, there's a couple
9 of things. The season has been pretty dry already, and then the fact that we're
10 getting hit in the US. Is because there's a couple of weather systems that ...

Create redacted audio files

In addition to redacting sensitive information from the transcription text, you can also generate a version of the original audio file with the PII “beeped” out.

Enable Sentiment Analysis by setting sentiment_analysis to true in the transcription config.

1 import assemblyai as aai
2 
3 aai.settings.api_key = "<YOUR_API_KEY>"
4 
5 # audio_file = "./local_file.mp3"
6 audio_file = "https://assembly.ai/wildfires.mp3"
7 
8 config = aai.TranscriptionConfig(sentiment_analysis=True)
9 
10 transcript = aai.Transcriber().transcribe(audio_file, config)
11 
12 for sentiment_result in transcript.sentiment_analysis:
13     print(sentiment_result.text)
14     print(sentiment_result.sentiment)  # POSITIVE, NEUTRAL, or NEGATIVE
15     print(sentiment_result.confidence)
16     print(f"Timestamp: {sentiment_result.start} - {sentiment_result.end}")

Example output

1 Smoke from hundreds of wildfires in Canada is triggering air quality alerts throughout the US.
2 SentimentType.negative
3 0.8181032538414001
4 Timestamp: 250 - 6350
5 ...

Sentiment Analysis Using LeMUR

Check out this cookbook LeMUR for Customer Call Sentiment Analysis for an example of how to leverage LeMUR’s QA feature for sentiment analysis.

Add speaker labels to sentiments

To add speaker labels to each sentiment analysis result, using Speaker Diarization, enable speaker_labels in the transcription config.

Each sentiment result will then have a speaker field that contains the speaker label.

1 config = aai.TranscriptionConfig(
2   sentiment_analysis=True,
3   speaker_labels=True
4 )
5 
6 # ...
7 
8 for sentiment_result in transcript.sentiment_analysis:
9   print(sentiment_result.speaker)

For the full API reference and FAQs, refer to the full Sentiment Analysis page.

Summarization

Distill important information by summarizing your audio files.

The Summarization model generates a summary of the resulting transcript. You can control the style and format of the summary using Summary models and Summary types.

Summarization and Auto Chapters

You can only enable one of the Summarization and Auto Chapters models in the same transcription.

Quickstart

Enable Summarization by setting summarization to true in the transcription config. Use summary_model and summary_type to change the summary format.

If you specify one of summary_model and summary_type, then you must specify the other.

The following example returns an informative summary in a bulleted list.

1 import assemblyai as aai
2 
3 aai.settings.api_key = "<YOUR_API_KEY>"
4 
5 # audio_file = "./local_file.mp3"
6 audio_file = "https://assembly.ai/wildfires.mp3"
7 
8 config = aai.TranscriptionConfig(
9   summarization=True,
10   summary_model=aai.SummarizationModel.informative,
11   summary_type=aai.SummarizationType.bullets
12 )
13 
14 transcript = aai.Transcriber().transcribe(audio_file, config)
15 
16 print(transcript.summary)

Example output

1 - Smoke from hundreds of wildfires in Canada is triggering air quality alerts throughout the US. Skylines from Maine to Maryland to Minnesota are gray and smoggy. In some places, the air quality warnings include the warning to stay inside.
2 - Air pollution levels in Baltimore are considered unhealthy. Exposure to high levels can lead to a host of health problems. With climate change, we are seeing more wildfires. Will we be seeing more of these kinds of wide ranging air quality consequences?

Custom Summaries Using LeMUR

If you want more control of the output format, see how to generate a Custom summary using LeMUR.

For the full API reference, as well as the supported summary models/types and FAQs, refer to the full Summarization page.

Topic Detection

The Topic Detection model lets you identify different topics in the transcript. The model uses the IAB Content Taxonomy, a standardized language for content description which consists of 698 comprehensive topics.

Quickstart

Enable Topic Detection by setting iab_categories to true in the transcription parameters.

1 import assemblyai as aai
2 
3 aai.settings.api_key = "<YOUR_API_KEY>"
4 
5 # audio_file = "./local_file.mp3"
6 audio_file = "https://assembly.ai/wildfires.mp3"
7 
8 config = aai.TranscriptionConfig(iab_categories=True)
9 
10 transcript = aai.Transcriber().transcribe(audio_file, config)
11 
12 # Get the parts of the transcript that were tagged with topics
13 for result in transcript.iab_categories.results:
14     print(result.text)
15     print(f"Timestamp: {result.timestamp.start} - {result.timestamp.end}")
16     for label in result.labels:
17         print(f"{label.label} ({label.relevance})")
18 
19 # Get a summary of all topics in the transcript
20 for topic, relevance in transcript.iab_categories.summary.items():
21     print(f"Audio is {relevance * 100}% relevant to {topic}")

Example output

1 Smoke from hundreds of wildfires in Canada is triggering air quality alerts throughout the US. Skylines...
2 Timestamp: 250 - 28920
3 Home&Garden>IndoorEnvironmentalQuality (0.9881)
4 NewsAndPolitics>Weather (0.5561)
5 MedicalHealth>DiseasesAndConditions>LungAndRespiratoryHealth (0.0042)
6 ...
7 Audio is 100.0% relevant to NewsAndPolitics>Weather
8 Audio is 93.78% relevant to Home&Garden>IndoorEnvironmentalQuality
9 ...

Topic Detection Using LeMUR

Check out this cookbook Custom Topic Tags for an example of how to leverage LeMUR for custom topic detection.

For the full API reference, as well as the full list of supported topics and FAQs, refer to the full Topic Detection page.