Generative AI is the bedrock on which a wave of new companies that are seeing tens of millions in early-stage investments are built. This unprecedented development has many wondering - what actually is Generative AI, and how does it work?
This series, Everything you know about Generative AI, will cover the important ideas required to understand this new wave of Generative AI, from foundational ideas to modern advancements. The articles are mostly independent and suitable for all audiences, so feel free to pick and choose the topics that interest you.
Below you can find each article in the series along with a brief description. We’ll be releasing them over the next few weeks, so be sure to sign up for our newsletter to stay up to date.
Introduction to Generative AI
The first article in our series will provide an introduction to Generative AI as a whole. This article can be read in isolation or as a foundation to explore more recent developments later in the series.
Generative AI - Images
Much progress has been made recently in the Image domain for Generative AI. Models like Stable Diffusion allow users to generate novel images in seconds given just a description, working based on novel ideas that are only a few years old.
In the second article in our series, we will explore the recent advancements that have led to this watershed progress, and look at how to test out these models yourself.
Generative AI - Language
Generative AI models in the Language domain has been the center of attention for several months. Models like ChatGPT allow users to dynamically interact with these models, having a conversation with a model that not only seems human, but seems like a well-educated human that can provide information on nearly any topic.
In the third article in our series, we’ll explore how these models work and what has led to their meteoric rise.
Generative AI - Audio
Generative AI in the audio domain has seen some exciting recent developments, especially in Text-to-Speech (TTS) and Text-to-Music (TTM) models. Now models like VALL-E or Voicebox can be used to generate voices with only a few seconds of input, while Google’s MusicLM can be prompted to generate a short music clip that follows the contour of a whistled melody. For example, the below audio sample was made with MusicLM using the prompt "A rhythmic east coast boom-bap hip hop beat, uplifting and inspirational"
The final article in this series takes a technical look at some of the recent models in this space, including MusicLM, VALL-E, and also explains the key ideas behind Neural Audio Codecs and Residual Vector Quantization techniques, which have become essential in nearly all text-to-audio models.