Audio Intelligence

Auto Chapters

The Auto Chapters model summarizes audio data over time into chapters. Chapters makes it easy for users to navigate and find specific information.

Each chapter contains the following:

  • Summary
  • One-line gist
  • Headline
  • Start and end timestamps
Auto Chapters and Summarization

You can only enable one of the Auto Chapters and Summarization models in the same transcription.

Quickstart

Enable Auto Chapters by setting auto_chapters to true in the transcription config. punctuate must be enabled to use Auto Chapters (punctuate is enabled by default).

1import assemblyai as aai
2
3aai.settings.api_key = "<YOUR_API_KEY>"
4
5# audio_file = "./local_file.mp3"
6audio_file = "https://assembly.ai/wildfires.mp3"
7
8config = aai.TranscriptionConfig(auto_chapters=True)
9
10transcript = aai.Transcriber().transcribe(audio_file, config)
11
12for chapter in transcript.chapters:
13 print(f"{chapter.start}-{chapter.end}: {chapter.headline}")

Example output

1250-28840: Smoke from hundreds of wildfires in Canada is triggering air quality alerts across US
229610-280340: High particulate matter in wildfire smoke can lead to serious health problems
Auto Chapters Using LeMUR

Check out this cookbook Creating Chapter Summaries for an example of how to leverage LeMUR’s custom text input parameter for chapter summaries.

For the full API reference, see the API reference section on the Auto Chapters page.

Content Moderation

The Content Moderation model lets you detect inappropriate content in audio files to ensure that your content is safe for all audiences.

The model pinpoints sensitive discussions in spoken data and their severity.

Quickstart

Enable Content Moderation by setting content_safety to true in the transcription config.

1import assemblyai as aai
2
3aai.settings.api_key = "<YOUR_API_KEY>"
4
5# audio_file = "./local_file.mp3"
6audio_file = "https://assembly.ai/wildfires.mp3"
7
8config = aai.TranscriptionConfig(content_safety=True)
9
10transcript = aai.Transcriber().transcribe(audio_file, config)
11
12# Get the parts of the transcript which were flagged as sensitive.
13for result in transcript.content_safety.results:
14 print(result.text)
15 print(f"Timestamp: {result.timestamp.start} - {result.timestamp.end}")
16
17 # Get category, confidence, and severity.
18 for label in result.labels:
19 print(f"{label.label} - {label.confidence} - {label.severity}") # content safety category
20 print()
21
22# Get the confidence of the most common labels in relation to the entire audio file.
23for label, confidence in transcript.content_safety.summary.items():
24 print(f"{confidence * 100}% confident that the audio contains {label}")
25
26print()
27
28# Get the overall severity of the most common labels in relation to the entire audio file.
29for label, severity_confidence in transcript.content_safety.severity_score_summary.items():
30 print(f"{severity_confidence.low * 100}% confident that the audio contains low-severity {label}")
31 print(f"{severity_confidence.medium * 100}% confident that the audio contains medium-severity {label}")
32 print(f"{severity_confidence.high * 100}% confident that the audio contains high-severity {label}")

Example output

1Smoke from hundreds of wildfires in Canada is triggering air quality alerts throughout the US. Skylines...
2Timestamp: 250 - 28920
3disasters - 0.8141 - 0.4014
4
5So what is it about the conditions right now that have caused this round of wildfires to...
6Timestamp: 29290 - 56190
7disasters - 0.9217 - 0.5665
8
9So what is it in this haze that makes it harmful? And I'm assuming it is...
10Timestamp: 56340 - 88034
11health_issues - 0.9358 - 0.8906
12
13...
14
1599.42% confident that the audio contains disasters
1692.70% confident that the audio contains health_issues
17
1857.43% confident that the audio contains low-severity disasters
1942.56% confident that the audio contains mid-severity disasters
200.0% confident that the audio contains high-severity disasters
2123.57% confident that the audio contains low-severity health_issues
2230.22% confident that the audio contains mid-severity health_issues
2346.19% confident that the audio contains high-severity health_issues

Adjust the confidence threshold

The confidence threshold determines how likely something is to be flagged as inappropriate content. A threshold of 50% (which is the default) means any label with a confidence score of 50% or greater is flagged.

To adjust the confidence threshold for your transcription, include content_safety_confidence in the transcription config.

1# Setting the content safety confidence threshold to 60%.
2config = aai.TranscriptionConfig(
3 content_safety=True,
4 content_safety_confidence=60
5)

For the full API reference, as well as the supported labels and FAQs, refer to the full Content Moderation page.

Entity Detection

The Entity Detection model lets you automatically identify and categorize key information in transcribed audio content.

Here are a few examples of what you can detect:

  • Names of people
  • Organizations
  • Addresses
  • Phone numbers
  • Medical data
  • Social security numbers

For the full list of entities that you can detect, see Supported entities.

Supported languages

Entity Detection is available in multiple languages. See Supported languages.

Quickstart

Enable Entity Detection by setting entity_detection to true in the transcription config.

1import assemblyai as aai
2
3aai.settings.api_key = "<YOUR_API_KEY>"
4
5# audio_file = "./local_file.mp3"
6audio_file = "https://assembly.ai/wildfires.mp3"
7
8config = aai.TranscriptionConfig(entity_detection=True)
9
10transcript = aai.Transcriber().transcribe(audio_file, config)
11
12for entity in transcript.entities:
13 print(entity.text)
14 print(entity.entity_type)
15 print(f"Timestamp: {entity.start} - {entity.end}\n")

Example output

1Canada
2location
3Timestamp: 2548 - 3130
4
5the US
6location
7Timestamp: 5498 - 6350
8
9...

For the full API reference, as well as the supported entities and FAQs, refer to the full Entity Detection page.

Key Phrases

The Key Phrases model identifies significant words and phrases in your transcript and lets you extract the most important concepts or highlights from your audio or video file.

Quickstart

Enable Key Phrases by setting auto_highlights to true in the transcription config.

1import assemblyai as aai
2
3aai.settings.api_key = "<YOUR_API_KEY>"
4
5# audio_file = "./local_file.mp3"
6audio_file = "https://assembly.ai/wildfires.mp3"
7
8config = aai.TranscriptionConfig(auto_highlights=True)
9
10transcript = aai.Transcriber().transcribe(audio_file, config)
11
12for result in transcript.auto_highlights.results:
13 print(f"Highlight: {result.text}, Count: {result.count}, Rank: {result.rank}, Timestamps: {result.timestamps}")

Example output

1Highlight: air quality alerts, Count: 1, Rank: 0.08, Timestamps: [Timestamp(start=3978, end=5114)]
2Highlight: wide ranging air quality consequences, Count: 1, Rank: 0.08, Timestamps: [Timestamp(start=235388, end=238838)]
3Highlight: more fires, Count: 1, Rank: 0.07, Timestamps: [Timestamp(start=184716, end=185186)]
4...

For the full API reference and FAQs, refer to the full Key Phrases page.

PII Redaction

The PII Redaction model lets you minimize sensitive information about individuals by automatically identifying and removing it from your transcript.

Personal Identifiable Information (PII) is any information that can be used to identify a person, such as a name, email address, or phone number.

When you enable the PII Redaction model, your transcript will look like this:

  • With hash substitution: Hi, my name is ####!
  • With entity_name substitution: Hi, my name is [PERSON_NAME]!

You can also Create redacted audio files to replace sensitive information with a beeping sound.

Supported languages

PII Redaction is available in multiple languages. See Supported languages.

Redacted properties

PII only redacts words in the text property. Properties from other features may still include PII, such as entities from Entity Detection or summary from Summarization.

Quickstart

Enable PII Redaction on the TranscriptionConfig using the set_redact_pii() method.

Set policies to specify the information you want to redact. For the full list of policies, see PII policies.

1import assemblyai as aai
2
3aai.settings.api_key = "<YOUR_API_KEY>"
4
5# audio_file = "./local_file.mp3"
6audio_file = "https://assembly.ai/wildfires.mp3"
7
8config = aai.TranscriptionConfig().set_redact_pii(
9 policies=[
10 aai.PIIRedactionPolicy.person_name,
11 aai.PIIRedactionPolicy.organization,
12 aai.PIIRedactionPolicy.occupation,
13 ],
14 substitution=aai.PIISubstitutionPolicy.hash,
15)
16
17transcript = aai.Transcriber().transcribe(audio_file, config)
18
19print(transcript.text)

Example output

1Smoke from hundreds of wildfires in Canada is triggering air quality alerts
2throughout the US. Skylines from Maine to Maryland to Minnesota are gray and
3smoggy. And in some places, the air quality warnings include the warning to stay
4inside. We wanted to better understand what's happening here and why, so we
5called ##### #######, an ######### ######### in the ########## ## #############
6###### ### ########### at ##### ####### ##########. Good morning, #########.
7Good morning. So what is it about the conditions right now that have caused this
8round of wildfires to affect so many people so far away? Well, there's a couple
9of things. The season has been pretty dry already, and then the fact that we're
10getting hit in the US. Is because there's a couple of weather systems that ...

Create redacted audio files

In addition to redacting sensitive information from the transcription text, you can also generate a version of the original audio file with the PII “beeped” out.

Enable Sentiment Analysis by setting sentiment_analysis to true in the transcription config.

1import assemblyai as aai
2
3aai.settings.api_key = "<YOUR_API_KEY>"
4
5# audio_file = "./local_file.mp3"
6audio_file = "https://assembly.ai/wildfires.mp3"
7
8config = aai.TranscriptionConfig(sentiment_analysis=True)
9
10transcript = aai.Transcriber().transcribe(audio_file, config)
11
12for sentiment_result in transcript.sentiment_analysis:
13 print(sentiment_result.text)
14 print(sentiment_result.sentiment) # POSITIVE, NEUTRAL, or NEGATIVE
15 print(sentiment_result.confidence)
16 print(f"Timestamp: {sentiment_result.start} - {sentiment_result.end}")

Example output

1Smoke from hundreds of wildfires in Canada is triggering air quality alerts throughout the US.
2SentimentType.negative
30.8181032538414001
4Timestamp: 250 - 6350
5...
Sentiment Analysis Using LeMUR

Check out this cookbook LeMUR for Customer Call Sentiment Analysis for an example of how to leverage LeMUR’s QA feature for sentiment analysis.

Add speaker labels to sentiments

To add speaker labels to each sentiment analysis result, using Speaker Diarization, enable speaker_labels in the transcription config.

Each sentiment result will then have a speaker field that contains the speaker label.

1config = aai.TranscriptionConfig(
2 sentiment_analysis=True,
3 speaker_labels=True
4)
5
6# ...
7
8for sentiment_result in transcript.sentiment_analysis:
9 print(sentiment_result.speaker)

For the full API reference and FAQs, refer to the full Sentiment Analysis page.

Summarization

Distill important information by summarizing your audio files.

The Summarization model generates a summary of the resulting transcript. You can control the style and format of the summary using Summary models and Summary types.

Summarization and Auto Chapters

You can only enable one of the Summarization and Auto Chapters models in the same transcription.

Quickstart

Enable Summarization by setting summarization to true in the transcription config. Use summary_model and summary_type to change the summary format.

If you specify one of summary_model and summary_type, then you must specify the other.

The following example returns an informative summary in a bulleted list.

1import assemblyai as aai
2
3aai.settings.api_key = "<YOUR_API_KEY>"
4
5# audio_file = "./local_file.mp3"
6audio_file = "https://assembly.ai/wildfires.mp3"
7
8config = aai.TranscriptionConfig(
9 summarization=True,
10 summary_model=aai.SummarizationModel.informative,
11 summary_type=aai.SummarizationType.bullets
12)
13
14transcript = aai.Transcriber().transcribe(audio_file, config)
15
16print(transcript.summary)

Example output

1- Smoke from hundreds of wildfires in Canada is triggering air quality alerts throughout the US. Skylines from Maine to Maryland to Minnesota are gray and smoggy. In some places, the air quality warnings include the warning to stay inside.
2- Air pollution levels in Baltimore are considered unhealthy. Exposure to high levels can lead to a host of health problems. With climate change, we are seeing more wildfires. Will we be seeing more of these kinds of wide ranging air quality consequences?
Custom Summaries Using LeMUR

If you want more control of the output format, see how to generate a Custom summary using LeMUR.

For the full API reference, as well as the supported summary models/types and FAQs, refer to the full Summarization page.

Topic Detection

The Topic Detection model lets you identify different topics in the transcript. The model uses the IAB Content Taxonomy, a standardized language for content description which consists of 698 comprehensive topics.

Quickstart

Enable Topic Detection by setting iab_categories to true in the transcription parameters.

1import assemblyai as aai
2
3aai.settings.api_key = "<YOUR_API_KEY>"
4
5# audio_file = "./local_file.mp3"
6audio_file = "https://assembly.ai/wildfires.mp3"
7
8config = aai.TranscriptionConfig(iab_categories=True)
9
10transcript = aai.Transcriber().transcribe(audio_file, config)
11
12# Get the parts of the transcript that were tagged with topics
13for result in transcript.iab_categories.results:
14 print(result.text)
15 print(f"Timestamp: {result.timestamp.start} - {result.timestamp.end}")
16 for label in result.labels:
17 print(f"{label.label} ({label.relevance})")
18
19# Get a summary of all topics in the transcript
20for topic, relevance in transcript.iab_categories.summary.items():
21 print(f"Audio is {relevance * 100}% relevant to {topic}")

Example output

1Smoke from hundreds of wildfires in Canada is triggering air quality alerts throughout the US. Skylines...
2Timestamp: 250 - 28920
3Home&Garden>IndoorEnvironmentalQuality (0.9881)
4NewsAndPolitics>Weather (0.5561)
5MedicalHealth>DiseasesAndConditions>LungAndRespiratoryHealth (0.0042)
6...
7Audio is 100.0% relevant to NewsAndPolitics>Weather
8Audio is 93.78% relevant to Home&Garden>IndoorEnvironmentalQuality
9...
Topic Detection Using LeMUR

Check out this cookbook Custom Topic Tags for an example of how to leverage LeMUR for custom topic detection.

For the full API reference, as well as the full list of supported topics and FAQs, refer to the full Topic Detection page.