Papers | Notion

1st Reivew

[x] [[🚶Prompt2Walk]]
[ ] The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
[ ] InnerMonlogue: "Inner Monologue: Embodied Reasoning through Planning with Language Models", arXiv, July 2022. Paper] Website

[x] 2403.05468 Will GPT-4 Run DOOM? (arxiv.org), [[Will GPT-4 Run DOOM]]
- Multi Modal을 이용한게 아님!
[ ] [[SpatialVLM- Endowing Vision-Language Models with Spatial Reasoning Capabilities]] ⭐
[ ] [[An Image is Worth Half Tokens After Layer 2- Plug-and-Play Inference Acceleration for Large Vision-Language Models]]
[ ] PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER, Oct 2023
[ ] [[LLaMA-Adapter V2 Parmeter Efficient Visual Instruction Model]]
[ ] [[PaLM2-VAdapter Progressively Aligned Language Model Makes a Strong Vision-language Adapter]]
[ ] Matcha: "Chat with the Environment: Interactive Multimodal Perception using Large Language Models", IROS, 2023. Paper] Github Website
[ ] [[LLaVA-Med Large Language and Vision Assistant for BioMedicine]]
[ ] Luodian/Otter: 🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability. (github.com), May 2023
- 성능이 더 좋은 듯.
- Resolution영향이 없음.
- [[Otter, A Multi-Modal Model with In-Context Instruction Tuning]]
[ ] 2204.14198 Flamingo: a Visual Language Model for Few-Shot Learning (arxiv.org), Nov 2022