May 17, 2024

Newsletter #36: Latest Speech-to-text Model Benchmarks

Latest Speech-to-text Model Benchmarks & AssemblyAI JavaScript SDK 4.4.3 Released

Smitha Kolan

Developer Educator

Word Error Rate (WER)

SDK

Reviewed by

Table of contents

[Visible on live site]

Latest AssemblyAI Benchmarks

Check out our new Benchmarks page, showcasing the performance of our Speech-to-Text API. Learn about our latest metrics:

Try our API now with these features:

Speaker Diarization: Automatically identify different speakers in audio files.
PII Redaction: Automatically redact sensitive personal information.

Visit our documentation and start integrating our advanced speech recognition into your products today.

AssemblyAI JavaScript SDK 4.4.3 Released

Visit the AssemblyAI docs for step-by-step instructions and a lot more details about our AI models and API. Explore the SDK API reference for more details on the SDK types, functions, and classes.

Fresh From Our Blog

Automatically determine video sections with AI using Python: Learn how to automatically determine video sections, how to generate section titles with LLMs, and how to format the information for YouTube chapters. Read more>>

Redact PII in Audio with Make and AssemblyAI: Create a Make scenario using the AssemblyAI app that watches a Google Drive folder for new audio files, and then creates both a transcript and an audio file in which PII is redacted. Read more>>

How to use audio data in LlamaIndex with Python: Discover how to incorporate audio files into LlamaIndex and build an LLM-powered query engine in this step-by-step tutorial. Read more>>

Our Trending YouTube Tutorials

How to use @postman to test LLMs with audio data (Transcribe and Understand): Learn how to transcribe audio and video files using AssemblyAI and also how to use LeMUR, AssemblyAI's framework for using Large Language Models on spoken data without having to code at all.

Build A Talking AI with LLAMA 3 (Python tutorial): This tutorial shows you how to build a talking AI using real-time transcription with AssemblyAI, using LLAMA 3 as the language model with Ollama, and ElevenLabs for text-to-speech.

How to Build a Better User Experience with Customizable Real-Time Speech-to-Text: In this video, learn how to customize your applications that use streaming speech-to-text with just one additional line of code.

‍