Introducing Auto Chapters - Summarize Audio and Video Files
Today, we're excited to officially announce our newest feature at AssemblyAI - Auto Chapters. The Auto Chapters feature provides a "summary over time" for audio content transcribed with AssemblyAI's API.



Today, we're excited to officially announce our newest feature at AssemblyAI - Auto Chapters.
The Auto Chapters, or Text Summarization, feature provides a "summary over time" for audio content transcribed with AssemblyAI's Speech-to-Text API. It works by first breaking audio/video files into logical "chapters" as the topic of conversation changes, and then provides an automatically generated summary for each "chapter" of content.

In the above graphic, we demonstrate the "chapters" that were extracted, and the summaries that were generated, from Joe Biden's State of the Union address. For each "chapter" that is detected, the API returns a JSON schema like the below:
chapters: [
{
"start": 0,
"end": 20000,
"summary": "The American job plan is going to create millions of good paying jobs. jobs created in an American jobs plan do not require a College degree. 75% don't require an associate's degree.",
"headline": "The American job plan is going to create millions of good paying jobs.",
}
...
]
As you can see above, the API responds with the start
and end
timestamps (in milliseconds) for each "chapter" that was detected, a summary
which is a few sentence summary of the content spoken during that timeframe, and a short headline
which can be thought of as a "summary of the summary".
#Auto Chapters In Action
Below is the entire 1 hour and 43 minute State of the Union address that Biden gave to Congress on April 28, 2021, and the "chapters" that were detected by the AssemblyAI API, along with their summaries.
1:45: I have the high privilege
and
distinct
honor
to
present
to
you the President
of
the United States.
31:42: 90% of
Americans now live within 5 miles
of
a vaccination site.
44:28: The American job plan is
going
to
create millions
of
good paying jobs.
47:59: No one working 40 hours a week should live below the poverty line.
48:22: American jobs finally
be the biggest increase
in
non defense research
and
development.
49:21: The National Institute of
Health, the NIH, should create a similar advanced research Projects agency
for
Health.
50:31: It would have a singular purpose to
develop breakthroughs
to
prevent, detect
and
treat diseases
like
Alzheimer
's, diabetes and cancer.
51:29: I wanted to
lay out before the Congress my plan.
52:19: When
this nation made twelve years
of
public
education universal
in
the last century, it made us the best educated, best prepared nation
in
the world.
54:25: The American Family's Plan guarantees four additional years of public education for every person in America, starting as early as we can.
57:08: American Family's Plan will provide access to quality, affordable childcare.
61:58: I will not
impose any tax increase
on
people making less than $400,000.
67:34: He said the U.S. will become an Arsenal for
vaccines
for
other countries.
74:12: After 20 years of
value, Valor
and
sacrifice, it
's time to bring those troops home.
76:01: We have to
come together
to
heal the soul
of
this nation.
80:02: Gun violence has become an epidemic in
America.
84:23: If
you believe we need
to
secure the border, pass it.
85:00: Congress needs to
pass legislation this year
to
finally
secure protection
for
dreamers.
87:02: If
we want
to
restore the soul
of
America, we need
to
protect the right
to
vote.
#How Auto Chapters Works
Behind the Auto Chapters feature is a set of powerful Machine Learning models. The first model is able to segment an audio file into "chapters" (ie, detect when the topic changes), and the second model summarizes those chapters into bite-sized summaries.

#Use Cases

Below are just some of the use cases our customers are already using the Auto Chapters feature for:
- Video Platforms - Automatically create "video chapters" to make videos easier for users to click around, and to jump to the content they're looking for.
- Podcast Players - Extract interesting segments of a podcast episode, and make podcast episodes more searchable so users can jump to key parts of an episode to "sample" an episode before listening to the entire thing.
- Virtual Meeting Platforms - Offer summaries of the key parts of a meeting, and make meeting recordings easier to consume after the fact.
- Telephony - Make phone calls easier to navigate, especially when doing QA within contact centers.
#Using the Auto Chapters Feature
When requesting a transcription with the AssemblyAI API, simply include the auto_chapters: true
parameter in your POST requests. For example, in cURL:
curl --request POST \
--url https://api.assemblyai.com/v2/transcript \
--header 'authorization: YOUR-API-TOKEN' \
--header 'content-type
: application/json' \
--data '{"audio_url": "https://foo.bar/7510.mp3", "auto_chapters": true}'
When your transcription is completed, you'll see a chapters
key in the JSON response, like below:
{
"audio_duration": 12.0960090702948,
"audio_url": "https://s3-us-west-2.amazonaws.com/blog.assemblyai.com/audio/8-7-2018-post/7510.mp3",
"confidence": 0.956,
"id": "5551722-f677-48a6-9287-39c0aafd9ac1",
"status": "completed",
"text": "The American job plan ...",
# auto chapter results can be found in the JSON result here
chapters: [
{
"start": 0,
"end": 20000,
"summary": "The American job plan is going to create millions of good paying jobs. jobs created in an American jobs plan do not require a College degree. 75% don't require an associate's degree.",
"headline": "The American job plan is going to create millions of good paying jobs.",
}
...
]
"words": [
{
"confidence": 1.0,
"end": 440,
"start": 0,
"text": "You"
},
...
]
}
Isolating the chapters
key for a moment, we can drill into the JSON response here:
chapters: [
{
"start": 0,
"end": 20000,
"summary": "The American job plan is going to create millions of good paying jobs. jobs created in an American jobs plan do not require a College degree. 75% don't require an associate's degree.",
"headline": "The American job plan is going to create millions of good paying jobs.",
}
...
]
For each chapter that was detected, the API will include with the start
and end
timestamps (in milliseconds), a summary
- which is a few sentence summary of the content spoken during that timeframe - and a short headline
, which can be thought of as a "summary of the summary".
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.