<aside> 💡
Welcome to InfiniteBench, a cutting-edge benchmark tailored for evaluating the capabilities of language models to process, understand, and reason over super long contexts (100k+ tokens). Long contexts are crucial for enhancing applications with LLMs and achieving high-level interaction. InfiniteBench is designed to push the boundaries of language models by testing them against a context length of 100k+, which is 10 times longer than traditional datasets.
총 12개의 tasks 중에 En.MC 데이터
| Task Name | Context | # Examples | Avg Input Tokens | Avg Output Tokens | Description |
|---|---|---|---|---|---|
| En.MC | Fake Book | 229 | 184.4k | 5.3 | Multiple choice questions derived from the fake book. |

git clone <https://github.com/OpenBMB/InfiniteBench.git>
cd InfiniteBench
bash scripts/download_dataset.sh
from datasets import load_dataset, Value, Sequence
ft = Features({"id": Value("int64"), "context": Value("string"), "input": Value("string"), "answer": Sequence(Value("string")), "options": Sequence(Value("string"))})
dataset = load_dataset("xinrongzhang2022/InfiniteBench", features=ft)