백병인:in-context learning과 ConstitutionalAI

In-Context Learning

(참고)

How does in-context learning work? A framework for understanding the differences from traditional supervised learning

입력단에서 대화의 맥락을 규정하는 방법 중 Instruction 학습(in-weight)이 아닌 demonstration에서의 반복패턴을 통해 추론해야 할 concept이 무엇인지 제시하는 방법.
- MetaICL 논문에 따르면 좋은 In-Context Learning을 구사하면 Instruction 방법보다 성능이 좋더라.
In-Context Learning vs. In-Weight Learning

pretrain을 통해 concept에 대한 학습이 진행되었으므로, In-context Learning 방식은 해당 concept의 활용을 위해 추가 fine-tune이 필요하지 않은 장점이 있다.
Concept Learning?

We can think of a concept as a latent variable that contains various document-level statistics. → 개념의 추상화가 가능해졌다는 주장.

Bayesian inference view of in-context learning

Pretraining을 통해 다양하게 학습된 concept 를 통해 LLM은 베이지언추론을 하듯이 concept을 활용할 수 있게 된다는 주장

어쩌면 TODS 반제품주의자들이 생각했던 slot의 필요성은 부정되는게 아닐까?

ChatGPT가 보여주는 뛰어난 언어활용능력의 본질이 바로 이것?

Untitled

흥미로운 실험(Min et. al. Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?)

Demonstration에서 label을 random하게 줘도, concept만 맞으면 추론성능은 크게 떨어지지 않더라. 중요한건 반복되는 패턴에서 규칙을 배울수 있는가임.

Untitled

Data Distributional Properties Drive Emergent In-Context Learning in Transformers

https://arxiv.org/pdf/2205.05055.pdf

In-context Learning에 대한 Deepmind의 고찰 (in-context learning에 대해 깊이있게 이해하려면 읽어보면 좋겠음)

what aspects of the training regime lead to this emergent behavior? Here, we show that this behavior is driven by the distributions of the training data itself.

we found that in-context learning traded off against more conventional weight-based learning, and models were unable to achieve both simultaneously

we found that naturalistic data distributions were only able to elicit in-context learning in transformers, and not in recurrent models.

In-context Learning and Induction Heads