Research on Diffusion models

Description

Research on the differences between various diffusion models
- DDPM
- Score-based model
- Consistency model
one reddit discussion

<aside> 💡 there's an implementation of score-based models from the paper that showed how score based models and diffusion models are the same here: https://github.com/yang-song/score_sde_pytorch

imo their implementation is more or less the same as a diffusion model, except score based models would use a numerical ODE/SDE solver to generate samples instead of using the DDPM based sampling method. it might also train on continuous time, so rather than choosing t ~ randint(0, 1000) it would be t ~ rand_uniform(0, 1.)

the speed and quality of score based/diffusion depends on what sampler you use. If youre using euler's method to solve the ODE for example, that might be slower than some of the newer methods developed for diffusion models, like tero karass' ODE solvers. AFAIK there isnt consensus on what the best sampler to use is though.

i dont think it affects training convergence much though since its more or less the same objective.

</aside>
diffusion history
- Diffusion Probabilistic Model(DPM) and Score-based model
  - 각각 따로 발전
  - DPM은 ELBO를 이용해 학습, decoder를 이용해 sampling
  - Score-based model은 score matching을 이용해 학습, langevin dynamics를 통해 sampling
    - NCSN (2020)
      - Score-based Generative Models 논문에서 제시한 모델은 Score matching을 위해 Noise Condition Score Network(NCSN)을 사용하였고 샘플 생성을 위해 Annealed Langevin Dynamics를 사용한 모델
- Jonathan Ho et al. DDPM (2020)
  - DPM과 Score-based model의 연관성을 연구
  - DPM의 ELBO가 score matching의 가중치를 주고 조합한 것과 동일하다는 것을 밝힘
  - decoder를 score-based model의 U-net으로 사용해서 좋은 결과를 냄
- Yang Song et al. Score-based generative modeling with stochastic differential equations (2021)
  - DPM과 SBM을 SDE의 관점으로 하나의 통합된 프레임워크로 묶음
    - 둘 모두 score function에 의해 정의되는 이산화된 SDE라는 것을 증명
    - 즉 NCSN과 DDPM을 continuous time으로 확장시키면 SDE 프레임워크 관점에서 설명할 수 있음
```
NCSN, DDPM의 forward SDE 수식을 보면 (continuous timestep을 적용한 결과)NCSN은 timestep이 증가함에 따라 variance가 점점 커지는 Variance Exploding SDE(VE-SDE), DDPM은 분산이 유지되는 Variance Preserving SDE로 부를 수 있다.SDE 관점으로 NCSN, DDPM을 보면 이제 샘플링은 reverse SDE를 푸는것과 같기 때문에 전통적인 방식인 Euler-Maruyama, Runge-Kutta와 같은 방식을 사용할 수 있게된다.논문에서는 Predictor: Reverse SDE를 풀어 데이터 생성, Corrector: Predictor에서 생성된 데이터에 Annealed Langevin dynamics를 사용하는 Predictor, Corrector Sampler를 제시하였다.
```
      출처:
      
      https://dlaiml.tistory.com/entry/Score-based-Generative-Models과-Diffusion-Probabilistic-Models과의-관계
      
      [Deeper Learning:티스토리]
  - Predictor - Corrector sampler라는 통합된 sampler를 제안
    - DPM sampling 방식을 annealed langevin dynamics와 결합
    - 아래 링크의 how to solve the reverse SDE 부분 추가 공부 필요
      - https://yang-song.net/blog/2021/score/
- 최근에는 DPM과 SBM이 같은 model family라고 증명되었으며, 더 좋은 sampler를 제안하는 논문들이 주로 나옴