Target Conf.
- Interspeech 2024 (deadline: March 2, 2024.)
- ACM MM 2024 (deadline: April 12, 2024.)
TODOs
- [x] Reproducing Grad-TTS using LJSpeech
- [x] Check official implementation of Grad-TTS
- [x] Download LJSpeech dataset
- [x] Download pretrained models of Grad-TTS and HiFi-GAN
- [x] Train and check result
- [ ] Reproducing Grad-TTS using LibriTTS (Multispeaker)
- [ ] Download LibriTTS dataset
- [ ] Train Grad-TTS using BEAT dataset
- [x] Data comparison: LJSpeech and BEAT dataset
- [x] BEAT dataset preprocessing (split sentences)
- [x] Handle different sampling rate (Grad-TTS 22.5kHz, BEAT 16kHz)
- [x] Check output sound quality
- [ ] Finetuning multispeaker setting
- [ ] Implement face module
- [ ] use pretrained TTS model
- BEAT datasets이 multispeaker에 한명당 녹음시간이 짧기(1~4시간) 때문에, pretrained model을 잘 쓰는게 중요할 듯
- [ ] train together
- [ ] train from scratch using BEAT only
- [ ] train TTS using other datasets and finetune together using BEAT
- [ ] Research on the diffusion methods
- [ ] DDPM vs. Score-based Diffusion Model vs. Consistency Model
Update Notes
- (1. 26.) Data Comparison ‣
- (2. 1.) Grad-TTS reproducing ‣
- (2. 6.) BEAT dataset preprocessing (split sentences) ‣
- (2. 7.) Fine tuning Grad-TTS using BEAT dataset (only single speaker, 9_miranda) ‣
- (working on) Implement face module
- !!! ground truth와 duration predictor의 output이 다른 문제를 face module에서 어떻게 해결한건지?? Grad-TTS에서 teacher forcing 방법을 썼나??
- (working on) Research on the diffusion models
Ref