2. Improving Language Understanding by Generative Pre-Training
3. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
4. RoBERTa : A Robustly Optimized BERT Pre-training Approach
5. BART : Denoising Sequence-to-Sequence Pre-training for Language Generation, Translation, and Comprehension