Customer Spotlight: Kapwing drives user adoption by building AI-first features
Joshua Grossberg, CTO at Kapwing, discusses Kapwing’s secret to building successful AI-first features, and how that included partnering with AssemblyAI.



Kapwing is a collaborative, browser- and cloud-based video editor, designed to help teams edit videos faster. Kapwing recently put AssemblyAI’s Core Transcription AI model with word-by-word timestamps and translation into production to better meet the demands of its increasing user base.
Kapwing’s CTO, Joshua Grossberg, met to discuss the company’s thoughtful approach to successfully building with AI and why they decided to partner with AssemblyAI.
What are the most important considerations when building with AI?
Our users are the most important consideration when building a new feature. There are some tools or some technology where we say, is it going to be too high-touch for our customers to use?
We try to separate things like, is it a gimmick, is it a stunt? Is it too complex for our user? But then somewhere in between those things is the sweet spot.
When choosing to integrate a new AI model or partner, the way we see it is that we have our core competencies as a company, but then we have to integrate with the outside world and bring other people's core competencies into ours.
How do you commit to a specific AI feature?
We look at trends, but we also have personas that we go by.
Sometimes we're chasing growth, where we say this feature attracts a high-growth persona. A person may have a lot of followers on TikTok or Instagram, so it's going to lead to high growth. But oftentimes that person in and of themselves is not a high-revenue persona.
So the type of people who are our high-revenue persona is like a small business person who's making an Instagram ad. Or maybe short-form YouTube tutorials and things like that.
And for that person, they tend to really want types of text-based embellishments. When we see something that is attractive to that person, like really strong word-by-word timings, or just really good transcription, then that makes sense for us.
If you have an hour of content, the difference between 99% accuracy and 97% accuracy, it's a lot of time for that person to review. So you could cut down their workflow from taking half an hour, taking 20 minutes, taking 15 minutes– it's huge, right?
What’s an example of one of Kapwing’s AI features?
A big thing that people want, and that correlates very highly with our paying customers, is transcriptions and translations.
People watch videos on mute now a lot. Someone sends me a video and if I'm supposed to watch them on the train without subtitles, I'm not going to watch it. And if the subtitles are engaging, that makes it better.
That's been a major driver of our revenue and a major driver for some of our best customers.
So, one of the things that we started to do to make our transcription editing more powerful was give people precise word timings. And that allows them to do things like trimming with the transcript and things like when you're actually trimming the video, you’re tethering the subtitles to the video as opposed to a specific point in time.
And this also allows us to do things like word-by-word animations.
We switched over to AssemblyAI because our previous API didn't have accurate enough word timing or foreign language translations. And foreign languages are actually important for us because we get a lot of users from around the world.
AssemblyAI was very helpful about working with us to do experiments to compare both the Word Error Rate and the overall timing accuracy. That, combined with a better price point, were big reasons why we switched.
What AI tech are you most excited about in the future?
The Generative AI stuff is cool. For us, we're seeing it happening on images, right? But then what's the next step to really augmenting video? I'm not sure it's ready for a lot of paid usage today, but in the future, it could be really compelling.
What AI-powered feature is next for Kapwing?
We’re exploring ways to leverage AI to speed up video creation, for example, automatically generate highlights or teasers, automatically edit raw footage, and generate voice-overs to simplify the filming process. The goal is to help more businesses and creators to tell stories through video fast and at scale.