
Diffusion Models and Score Matching
- The analysis in [3] reveals some interesting connections to score-based models. Indeed it is revealed that Diffusion Models and Score Matching are two sides of the same coin, much like the Heisenberg matrix and Schrodinger wave formulations of quantum mechanics. For details, see [3], but we will proceed with score matching because it is easier.
- New connection between diffusion models and denoising score matching. Leads to simplified, weighted variational bound objective for diffusion models [3].
Score Based Models
- [4] Rather than model pdf, represent the score - vector field pointing in direction where likelihood of data increases most rapdily.
- [4] First, function learned via denoising score matching. Intuitively means NN (score network) trained to denoise images blurred with Gaussian noise. Key insight from [2] was to use multiple noise scales to capture both coarse and fine features.
- [4] Then samples are generated with Langevin dynamics. Starts with white noise and denoises using score network. Could fail or take long time to converge when used in high D.
- [4] Same authors as [2] propose set of techniques to scale score based models to high res images. Use simplified mixture model to propose method to analytically compute effective set of noise scales. Also propose efficient architecture to amortize the score estimation across large number (possibly inf) of noise scales with single NN.
Score Function
- [4] or [2] for continuously sifferentiable pdf p(x), grad_x log p(x) is its score function.
- Once known
inline: \( q(\textbf{x}{1:T}|\textbf{x}0) = \prod{i=1}^{T} q(\textbf{x}t|\textbf{x}{t-1}) = \prod{i=1}^{T} \mathcal{N}(\textbf{x}t; \sqrt{1-\beta_t}\textbf{x}{t-1}, \beta_t\textbf{I}) \)
\( \textbf{x}\beta_t \)
centered: \[\mathbb{R}\] sdas
asd
The Goal
- The goal is quite simple - given dataset vectors x_1 => x_n iid from p(x), want to work "backwards" in order to determine the dsitrubtion p(x) from whcih they were generated. Once had, we can synthezie new datapoints by sampling [2]
- One approach are likelihood based models. Seek to directly learn the pdf with approx MaxL [2]. But there is tradeoff between tractability and flexibility [1]. Can use variational methods to fit a predefined model family. Easy to sample from and fit, but assuming the form of the distribution restricts performance and is likely unable to capture nuances of complex datasets.
- Can use a more flexible model - a distribution can be any nonnegative integral function. Problem is that function has to integrate to 1 to be a valid pdf, and computing this normalization constant is infeasible in most cases [1]. Also to train or draw samples, use expensive MC methods [1]
- We could scrap MaxL methods entirely and use implicit generative models like GANs [2], but the problems of training GANs are well-documented
The Solution
- The solution comes in the form of diffusion models
- THey borrow idea from non-equil stat phys - use markov chain to gradually convert one distribution into another. Map from well known dist - Normal, and tranform into data.
- In practice this comes as destroying image with noise and then learning to recover it, but really a mapping between distribution.
- This allows for a highly dlexible model. But what about tractability? [2] instead of learning pdf directly, we learn the score function, or the gradient of the log PDF.
- So important points - model with diffusion process and model the score function.
Why Score Function?
- Let's say we have the form of a function exponential [2]. Train by maxxing log likelihood. To do this, require a normalized probability density function. I.e. we must compute Z, which is typically intractable. To circumvent this, practitioners will typically restrict the form of f to make tractable, or approximate Z which is expensive.
- Instead of directly model the pdf, we model the score function. Such models are called score based models.
- * SHOW HOW DOES NOT DEPEND ON Z *
- So we use score function in order to avoid intractable normalization, therefore opening the door to more stuff. Don't need to approximate Z or restrict model structure to ensure normalized.
But how do we train?
- That's all well and good, but how do we learn this function?
- Minimize fischer divergence
What is Fischer Divergence
- It minimizes the average distance between two distributions.
- It makes sense to do this because we are training the model to approximate the true score function
How do we do this
- Not possible. Would need access to score function which we are trying to approximate. We use score matching.
- The score matching objective can be estimated on dataset and then minimized with gradient-based methods
- Only requirement on score model is that it should be vector valued with same input and output dimensionality
What is Score Matching?
Stuff
How do We Use the Score Model?
- Now that the we have learned score based model, how do we use it to sample?
- MCMC method to sample from. It basically just looks like gradient ascent? Take point, add step size time gradient of log likelihood (gradient ascent) and then add some noise for stochasticity
Practical Guide to Score Matching:
- Estimated score functions innacurate in low density regions where few data points available for computing score matching objective. Fischer divergence weighted by density. BUT initial randomy sample that we work from is very likely in low density regions when data reside in high D space.
- Therefore use multiple noise peerturbations. ANNEALED langevin dynamics. Similar princple to simulated annealing
- Choose as geometric progression. Use a U-net. Apply exponential moving average on the weights when used at test time.
Where do diffusion models come into play
- We've been exactly describing diffusion model. Note that they don't reference a specific architecture, just a meta-approach. Like GANs. Contra CNN, diffusion models more akin to GANs in that they describe a general paradigm.
SO WE LEARN THE SCORE MODEL BY MINIMIZING FISCHER DIVERGENCE, AND THEN SAMPLE FROM IT USING LANGEVIN DYNAMICS?
Title goes here
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.