Deep Learning

Deep dives into AI, research, coding, and other topics.

Review - Text-Free Prosody-Aware Generative Spoken Language Modeling
Review - Text-Free Prosody-Aware Generative Spoken Language Modeling

In this week's Deep Learning Paper Review, we look at the following paper: Text-Free Prosody-Aware Generative Spoken Language Modeling.

Speaker Diarization - Speaker Labels for Mono Channel Files
Speaker Diarization - Speaker Labels for Mono Channel Files

AssemblyAI Speech-to-Text API's Speaker Diarization (diarisation) is the process of splitting audio or video inputs automatically based on the speaker's identity. It helps you answer the question "who spoke when?".

Fine-Tuning Transformers for NLP
Fine-Tuning Transformers for NLP

Since the Attention Is All You Need paper, Transformers have completely redefined the field of Natural Language Processing. In this blog, we show you how to quickly fine-tune Transformers for numerous downstream tasks, that often perform really well out of the box!

Comparing End-To-End Speech Recognition Architectures in 2021
Comparing End-To-End Speech Recognition Architectures in 2021

As part of our core research and development efforts to continue pushing the state of the art of speech recognition accuracy, in this post, we explore speech recognition architectures that are gaining new popularity in both academia and industry settings.

Building an End-to-End Speech Recognition Model in PyTorch
Building an End-to-End Speech Recognition Model in PyTorch

The complete guide on how to build an end-to-end Speech Recognition model in PyTorch. Train your own CTC Deep Speech model using this tutorial.