Ryan O'Connor

Senior Developer Educator

How to evaluate Speech Recognition models
How to evaluate Speech Recognition models

Speech Recognition models are key in extracting useful information from audio data. Learn how to properly evaluate speech recognition models in this easy-to-follow guide.

Golden Gemini: A new approach in Speech AI
Golden Gemini: A new approach in Speech AI

A new method both improves performance and lowers computational requirements through a clever observation in speech theory

Announcing the AssemblyAI integration for LiveKit
Announcing the AssemblyAI integration for LiveKit

LiveKit allows you to build real-time audio and video applications - now you can build with AssemblyAI's Streaming Speech-to-Text in LiveKit.

How to build a LiveKit app with real-time Speech-to-Text
How to build a LiveKit app with real-time Speech-to-Text

LiveKit allows you to build real-time audio and video applications - learn how to add real-time Speech-to-Text to your LiveKit application in this tutorial.

Universal in Action: Transforming Conversational Data Across Industries
Universal in Action: Transforming Conversational Data Across Industries

Universal-2 is solving problems in Conversational Intelligence by optimizing Speech-to-Text for real-world use cases

How to transcribe Zoom participant recordings (multichannel)
How to transcribe Zoom participant recordings (multichannel)

Zoom allows you to record each participant's audio track separately. Learn how to combine this with AssemblyAI's multichannel transcription for accurate meeting transcripts.

How we built our AI Lakehouse
How we built our AI Lakehouse

Learn how we built our AI data Lakehouse to allow for rapid research iteration while maintaining cohesive, secure, and deduplicated datasets.

How to use Google's Speech-to-Text API to transcribe audio in Python
How to use Google's Speech-to-Text API to transcribe audio in Python

Learn how to set up a Google Cloud project to transcribe both local and remote audio files using Google's Speech-to-Text API and Python

How to build a free Whisper API with GPU backend
How to build a free Whisper API with GPU backend

Learn how to make a free, GPU-powered Whisper API for transcribing audio files

How to perform Speaker Diarization in Python
How to perform Speaker Diarization in Python

Learn how to use Python to perform speaker diarization on audio and video files to identify "who said what when"

Speaker diarization vs speaker recognition - what's the difference?
Speaker diarization vs speaker recognition - what's the difference?

Learn the differences between speaker diarization and speaker recognition, as well as speaker verification and speaker identification in audio analysis

Florence-2: How it works and how to use it
Florence-2: How it works and how to use it

Microsoft's Florence-2 is a foundational image model that can perform almost every common task in computer vision. Learn how Florence-2 works and how to use it in this guide.

Speaker diarization improvements: new languages, increased accuracy
Speaker diarization improvements: new languages, increased accuracy

Announcing several improvements to our Speaker Diarization service, yielding a more accurate model that's available in more languages.

Content moderation on audio files with Python
Content moderation on audio files with Python

Modern AI models make it easy to automatically detect the presence of sensitive topics in speech data. Learn how to perform configurable content moderation with Python in this tutorial.

Filter profanity from audio files using Python
Filter profanity from audio files using Python

Learn how to filter profanity out of audio and video files with fewer than 10 lines of code in this tutorial

Automatically redact PII from audio and video with Python
Automatically redact PII from audio and video with Python

In this tutorial, we’ll learn how to automatically redact Personal Identifiable Information (PII) from audio and video files in 5 minutes using Python and AssemblyAI.

Transcribe a phone call in real-time using Python with AssemblyAI and Twilio
Transcribe a phone call in real-time using Python with AssemblyAI and Twilio

Learn how to transcribe a phone call in real-time using Python, AssemblyAI, ngrok, and Twilio

Lower latency, lower cost, more possibilities
Lower latency, lower cost, more possibilities

We’re excited to introduce major improvements to our API’s inference latency, with the majority of audio files now completing in well under 45 seconds regardless of audio duration.

Extract phone call insights with LLMs in Python
Extract phone call insights with LLMs in Python

Learn how to automatically extract insights from customer calls with Large Language Models (LLMs) and Python.

Automatically determine video sections with AI using Python
Automatically determine video sections with AI using Python

In this tutorial, we will learn how to automatically determine video sections, how to generate section titles with LLMs, and how to format the information for YouTube chapters.

Real-time transcription in Python
Real-time transcription in Python

Learn how to perform real-time transcription on audio streams using Python in this tutorial.

How DALL-E 2 Actually Works
How DALL-E 2 Actually Works

How does OpenAI's groundbreaking DALL-E 2 model actually work? Check out this detailed guide to learn the ins and outs of DALL-E 2.

Retrieval Augmented Generation on audio data with LangChain and Chroma
Retrieval Augmented Generation on audio data with LangChain and Chroma

Retrieval Augmented Generation (RAG) allows you to add relevant documents as context when querying LLMs. Learn how to perform RAG on audio data using LangChain and Chroma in this tutorial.

How to get Zoom Transcripts with the Zoom API
How to get Zoom Transcripts with the Zoom API

In this tutorial, we'll learn how to get Zoom transcripts using the Zoom API using Python.

Convert Speech to Text in Python in 5 Minutes
Convert Speech to Text in Python in 5 Minutes

Learn how to perform Automatic Speech Recognition in 5 minutes using Python and the AssemblyAI Speech-to-Text API with this simple tutorial.