
A company’s unstructured data is often overlooked and definitely underutilized. But it doesn’t have to be that way. What if product teams could instead apply the power of cutting-edge AI research to automatically sift through this unstructured data and extract meaningful information?
Recently, significant advances in Deep Learning technology have similarly advanced the AI models used for understanding language. This means the models serving as the “brains” behind Natural Language Processing APIs are more accurate and thus, useful, than ever before. Instead of piles of meaningless data, product teams are creating intelligent tools that automatically identify patterns across massive data sets, driving significant ROI for product roadmaps.
The principles behind these models are the same as those powering some of today’s most innovative technology, such as self-driving cars, personalized recommendation systems, and automatic fraud detection.
One of these AI-powered APIs is Topic Detection. In this article, we’ll look at what exactly Topic Detection is, how the models behind Topic Detection work, the best Topic Detection APIs to consider based on the task at hand, and some of its most valuable use cases.
What is Topic Detection?
Topic Detection, also sometimes referred to as Topic Analysis, uses AI models to detect and label topics in a body of text. Some Topic Detection APIs can also detect topics in audio and video streams that are transcribed with a Speech-to-Text API.
Topic Detection can be applied at multiple levels of scope. For example, topics can be extracted from an entire document or text (document level), from a single sentence (sentence level) or from phrases or parts of sentences (sub-sentence level). When applied to transcriptions, topics can also be extracted based on timestamps from the original audio or video file.
Topic Detection models are especially useful for performing text analysis at scale on large datasets.
How Does Topic Detection Work?
Most commonly, Topic Detection models comprise one of two approaches: Topic Modeling or Topic Classification.
Topic Modeling uses a generative model that takes an input text and generates a prediction for the topic discussed in the text–the topic label itself may or may not be included in the actual text. For example, a generative model could take the input Heavy Rain Expected in New York Today
and generate the topic Weather
even though the term `Weather` is not explicitly mentioned in the headline.
Topic Classification uses a classifier model that takes an input and outputs a probability that the text conforms with each topic in a predetermined list. For example, a simple classifier model could be designed with three possible outputs– baseball
, football
, or soccer
. When fed the text Barry Bonds ran around the bases after hitting a home run
, the model would output a probability that the text was one of the above three outputs. Topic Classification models can also be designed as multi-class classification as well.
Though Topic Modeling and Topic Classification make up the bulk of Topic Detection models, a Topic Detection model could also be designed using an extractive approach. Extractive models take an input and extract a topic based on the words/text included in the input text. If we take our first example, Heavy Rain Expected in New York Today
, the topic extracted could be rain
but not weather
since rain
is explicitly included in the text but weather
is not. An extractive approach, as demonstrated with the above example, makes it difficult to extrapolate the most appropriate topics, since it can only return words that are explicitly in the text.
All three approaches can also be applied to analyze audio and video stream transcriptions, in addition to the static text examples described above.
Best APIs for Topic Detection
Now that we’ve looked at what Topic Detection is and how Topic Detection models work, let’s examine a few of the best Topic Detection APIs on the market today. Note that some of the APIs perform Topic Detection solely on static texts, like a research document, while others work in tandem with Speech-to-Text APIs to perform Topic Detection on audio or video streams as well.
1. AssemblyAI’s Topic Detection API
AssemblyAI is a Deep Learning company known for its top-rated accessibility, affordability, and utility across its wide range of Audio Intelligence APIs. These include Topic Detection, Content Moderation, Text Summarization, Sentiment Analysis, Entity Detection, and more.
AssemblyAI designed its Topic Detection model around the IAB Taxonomy, an exhaustive list containing categories and subcategories of 698 topics compiled by the Interactive Advertising Bureau (IAB).
With its Topic Detection API, product teams and developers can determine which topics are mentioned in an audio or video file at a high rate of confidence. Those using the API will also receive a relevance key with a score between 0-1 that shows how relevant each topic label is for that particular portion of the text.
For example, the AssemblyAI Topic Detection API found baseball
to be the main topic of the following transcription text:
In my mind, I was basically done with Robbie Ray. He had shown flashes in the past, particularly with strikeouts, but it was just too efficient, walked too many guys, and got hit too hard, too. And it all changed this year, especially the walks to the point he’s now. He’s probably going to be the Cy Young winner in the AL.
Here is the full list of topics that can be detected by the AssemblyAI Topic Detection API:
Product teams or developers interested in testing the AssemblyAI Topic Detection API, or any of its other APIs, can sign up for free here.
Test AssemblyAI's Audio Intelligence APIs for Free
2. Amazon Comprehend
Amazon Comprehend offers a host of NLU/NLP APIs, including Topic Detection. Developers can use Amazon’s pre-trained models or train their own models for custom classification needs. Topics are determined via Topic Modeling, but for most accurate results, Amazon recommends at least 1,000 documents of at least three sentences each.
Pricing for Amazon Comprehend is usage based and need dependent.
3. TextRazor
TextRazor’s Topic Detection API is called Topic Tagging and provides automatic identification of topics in an unstructured text. TextRazor’s Topic Tagging model is trained on Wikipedia pages to assign high level topics without the need for custom training. Topics are classified based on the criteria laid out here.
Developers or product teams interested in TextRazor can use its services for free for up to 500 requests per day. Additional requests start at $200 per month.
4. Azure Cognitive Services
Microsoft Azure also offers Text Analytics as part of its Cognitive Services product series. Text Analytics lets users mine insights from unstructured texts to find common topics and trends. They also offer the same services to examine medical texts and terminology. To get developers started quickly, Microsoft also offers documentation and code samples on their Cognitive Services page.
Pricing for Text Analytics varies based on the type of API needed and the usage needed per month.
5. MeaningCloud
MeaningCloud offers a Topic Extraction API to perform Topic Detection for end users. Topic Extraction pulls out relevant information and topics from unstructured texts across a multitude of languages. The API is also highly configurable so users can adjust it to match any diverse operating scenarios.
Developers looking to get started can follow the guides laid out in the documentation.
6. uClassify
Finally, uClassify is a free Machine Learning web service that applies V2 of the IAB Taxonomy to static texts. uClassify also has additional APIs for Sentiment, Gender, Language, Age, Mood, Tonality, and more.
Product teams or developers can use the APIs for free for up to 500 calls per day, with paid plans for additional usage needs.
Topic Detection Use Cases
Topic Detection APIs power a diverse range of use cases.
Conversation Intelligence Platforms use Topic Detection, in conjunction with speech transcription, to help automate the time consuming meeting transcription and analysis process for end users. With Topic Detection, these platforms can make every conversation–voice-based or text-based–both searchable and indexable, helping end users gain useful insights from previously unstructured data.
Customer Research Platforms use Topic Detection APIs to generate key highlights from quantitative and qualitative customer feedback, providing an in depth look into customer behavior and opinion. Like Conversation Intelligence Platforms, Topic Detection can also help Customer Research Platforms categorize, tag, and search through the database of responses.
Revenue Intelligence Platforms apply Topic Detection to automatically identify and flag key sections of calls or other interactions with customers and leads. They also use Topic Detection in conjunction with other Audio Intelligence APIs, like Sentiment Analysis, to map topics directly to sales made or questions or objections raised by customers. Aggregate data associated with Topic Detection can even be used to optimize employee sales coaching.
Podcast platforms use Topic Detection APIs, in conjunction with speech transcription, to automatically identify and label recurring topics in podcasts. These topics can then be used to categorize podcasts or to help advertisers identify the most appropriate podcasts to place advertisements.
Want a demo of AssemblyAI's Topic Detection API?
Schedule a time below!
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.