About

How do you compare to Google Speech API, IBM's Watson, etc.?

It's hard to give an accurate comparison because our accuracy depends on the quality of your audio, and how well you customize the API with a Corpus. A Corpus is a collection of phrases and words that you'd like the API to focus on or add to the vocabulary. So far, we've found accuracy can increase up to 20% with a good Corpus. You can find more information here.

What languages do you currently support?

We only support US English today. The API will work with UK and AU English, but with degraded accuracy.
Need another language? ! It's great feedback for us to know what languages are important to you for prioritizing what languages to add next.

Audio

Is there a maximum length of audio the API can transcribe?

If you are using the /transcript endpoint, there is no limit. If you are streaming audio for a real-time transcript, you can stream a maximum of 15 seconds of speech.

How long does it take to transcribe audio?

For most audio files, it will take approximately 25% - 50% of the audio duration to generate a transcript. For example, for a 1 hour file it will take between 15 to 30 minutes to transcribe.
For transcribing real time audio, we provide transcripts of up to 15 seconds of speech in under 300 milliseconds.

What is the optimal audio format for your API?

The API can handle audio in any format, sample rate, etc. Just send us a URL and we can generate a transcript for it.

Do you provide voice output or Text-to-Speech (TTS)?

No, we currently do not provide TTS. However, there are many other API's out there that can work alongside our API! For example, both Lyrebird and Amazon Polly may be good options for your project.

How much does background noise affect the accuracy?

Our API works the best with clean, single speaker audio. However, we understand that sometimes background noise, echo or far-field can be introduced into the signal. If you have challenging audio, there's a few options to try. Creating a good Corpus, or multiple Corpora, can help make up for bad audio quality in a lot of cases. Our tests show an accuracy improvement of up to 20% relative when transcribing audio with a good corpus.

Do you provide partial transcripts as you process audio?

No, we currently do not provide partial transcripts.

API

Can I see the API docs?

Of course! Our API docs are accessible here.

How long will it take me to create a Corpus?

It takes approximately 1-5 minutes depending on how many phrases/words you add to your Corpus. For example, it takes about 3 minutes to create a Corpus off of 9,000 lines of text.

How can I test the accuracy right now?

We do not have an external testing area, but we're happy to set you up with an evaluation of our API. You can sign up on the wait list, and we will reach out to you shortly.

Do you provide timestamps?

Yes, we do! We provide timestamps per "segment", which is about the length of a 5 second phrase. For example, if you were searching for a word you would get the timestamp for the segment it's located in. There's more information in the API docs here, under the "Response JSON" section.

How often is recognition accuracy improved?

We are actively working on improving our API, including the recognition accuracy. The underlying speech recognition models we build are updated frequently, and we ship larger updates approximately every four weeks.

Pricing

What is your pricing? How does it compare to the other options?

We have information on our Pricing page but directly if you are interested to learn more about our pricing.

Privacy

Do you sign NDA's or other confidentiality agreements?

No, we currently do not sign NDA's or other types of confidentiality agreements. Please see our Terms of Service Section 6. Confidentiality, which covers most things your NDA or confidentiality agreement would, for more information.