ARC Challenge (0-shot)

<aside> 💡

벤치마크 목록들의 사용법
- 다운받아서 쓰는지 or 특정 코드를 돌리는지
- 다운받아서 쓰면 어디서 다운받고 어떻게 돌리는지 </aside>

ARC Challenge 벤치마크 개요

ARC(Challenge) benchmark(Common Sense Reasoning)

A new dataset of 7,787 genuine grade-school level, multiple-choice science questions, assembled to encourage research in advanced question-answering. The dataset is partitioned into a Challenge Set and an Easy Set, where the former contains only questions answered incorrectly by both a retrieval-based algorithm and a word co-occurrence algorithm. We are also including a corpus of over 14 million science sentences relevant to the task, and an implementation of three neural baseline models for this dataset. We pose ARC as a challenge to the community.

논문: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

ARC Challenge 사용 방법

다운로드

https://huggingface.co/datasets/allenai/ai2_arc

from datasets import load_dataset

ds = load_dataset("allenai/ai2_arc", "ARC-Challenge")

데이터셋 예시

## ARC-Challenge
{
    "answerKey": "B",
    "choices": {
        "label": ["A", "B", "C", "D"],
        "text": ["Shady areas increased.", "Food sources increased.", "Oxygen levels increased.", "Available water increased."]
    },
    "id": "Mercury_SC_405487",
    "question": "One year, the oak trees in a park began producing more acorns than usual. The next year, the population of chipmunks in the park also increased. Which best explains why there were more chipmunks the next year?"
}

평가 예시

https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/arc/arc_challenge_chat.yaml

ARC Challenge 벤치마크 개요

ARC Challenge 사용 방법

다운로드

데이터셋 예시

평가 예시

참고 자료