InfiniteBench/En.MC

<aside> 💡

벤치마크 목록들의 사용법
- 다운받아서 쓰는지 or 특정 코드를 돌리는지
- 다운받아서 쓰면 어디서 다운받고 어떻게 돌리는지 </aside>

InfiniteBench/En.MC 벤치마크 개요

InfiniteBench: Extending Long Context Evaluation Beyond 100K Tokens

Welcome to InfiniteBench, a cutting-edge benchmark tailored for evaluating the capabilities of language models to process, understand, and reason over super long contexts (100k+ tokens). Long contexts are crucial for enhancing applications with LLMs and achieving high-level interaction. InfiniteBench is designed to push the boundaries of language models by testing them against a context length of 100k+, which is 10 times longer than traditional datasets.

총 12개의 tasks 중에 En.MC 데이터

Task Name	Context	# Examples	Avg Input Tokens	Avg Output Tokens	Description
En.MC	Fake Book	229	184.4k	5.3	Multiple choice questions derived from the fake book.

InfiniteBench/En.MC 사용 방법

특정 코드

https://github.com/OpenBMB/InfiniteBench?tab=readme-ov-file

git clone <https://github.com/OpenBMB/InfiniteBench.git>

cd InfiniteBench
bash scripts/download_dataset.sh

다운로드

https://huggingface.co/datasets/xinrongzhang2022/InfiniteBench
permission 필요

from datasets import load_dataset, Value, Sequence
ft = Features({"id": Value("int64"), "context": Value("string"), "input": Value("string"), "answer": Sequence(Value("string")), "options": Sequence(Value("string"))})
dataset = load_dataset("xinrongzhang2022/InfiniteBench", features=ft)