We’re excited to release a major accuracy improvement to our Topic Detection feature, which can accurately predict the topics spoken in audio/video files. This feature is especially useful for customers handling podcasts, video files, or other media where understanding the topics being discussed can help with advertising, recommendation, and search capabilities.
When predicting topics, we use the IAB Content Taxonomy - which defines a list of 698 topics that can be used as a “common language” to describe content.
This update marks the 3rd generation (v3) of our Topic Detection feature - which is powered by a more advanced and powerful deep learning neural network. For example, these are the topics predicted by our Topic Detection feature on the following text:
The other coach worth mentioning is Ed Robinson who got called into the
last six nations. He was there almost in replacement of Jason. That was a
weird appointment wasn't it but all accounts. He fitted in really well
and did a reasonable job. But going back to your point about Ali heater
and lots of attacking coaches in the Premiership. I think part of the
problem is we have an unbelievable League which is thriving and people
take real pride in the club and the cities which they're representing.
Sports>Rugby>RugbyLeague: 99.53%
Sports>Rugby>RugbyUnion: 95.912%
Sports>Rugby: 87.61%
Sports>AustralianRulesFootball: 84.55%
As you can see above, even though the term “rugby” is never spoken, our AI knows that this conversation is about the Sports>Rugby>RugbyLeague
topic, because it understands that Ed Robinson is a Rugby coach.
How is this even possible? The AI model powering our Topic Detection feature is trained on such a large amount of text data that it’s able to learn and understand not just human language, but context as well.
Major Improvements
Our Topic Detection feature uses a powerful deep learning model to detect topics that are spoken in the audio/video file being transcribed with AssemblyAI’s Speech-to-Text API.
Legacy approaches look for specific words to be spoken in order to detect the topic. This is a very brittle approach, as human language is often very complex. Again, let’s look at the below example:
The other coach worth mentioning is Ed Robinson who got called into the
last six nations. He was there almost in replacement of Jason. That was a
weird appointment wasn't it but all accounts. He fitted in really well
and did a reasonable job. But going back to your point about Ali heater
and lots of attacking coaches in the Premiership. I think part of the
problem is we have an unbelievable League which is thriving and people
take real pride in the club and the cities which they're representing.
Sports>Rugby>RugbyLeague: 99.53%
Sports>Rugby>RugbyUnion: 95.912%
Sports>Rugby: 87.61%
Sports>AustralianRulesFootball: 84.55%
Reiterating our point above, our AI model is able to detect this is about Sports>Rugby>RugbyLeague
, even though the term “rugby” is never spoken. That’s because our AI model knows that Ed Robinson is a Rugby coach and that “six nations” is a Rugby tournament.
This is possible because our AI model maps words/sentences into a high dimensional vector space. To demonstrate this, let’s look at the below graph:
When our AI model puts content about religions (Buddhism, Christianity, Judaism, etc.) into a high-dimensional vector space, that content gets clustered near each other, because our AI model knows that those pieces of content are all similar - being about religion.
To demonstrate how much better our AI model is at detecting topics, let’s look at the below graphs:
The left graph shows how well our v2 Topic Detection model clusters the 698 different topics it can predict. The right graph shows that the v3 Topic Detection model is much better at clustering the 698 different topics - which demonstrates its more powerful understanding of the human language and text.
Using AssemblyAI’s Topic Detection Feature
When predicting topics, we use the IAB Content Taxonomy - which defines a list of 698 topics that can be used as a “common language” to describe content. This is especially useful for customers that are building Contextual Targeting solutions for advertisers. The full list of 698 topics our model predicts can be found below:
For more information on how this feature works, check out our API Documentation, or book a call with our solutions team!