<aside> 💡

InfiniteBench/En.MC 벤치마크 개요

InfiniteBench: Extending Long Context Evaluation Beyond 100K Tokens

Welcome to InfiniteBench, a cutting-edge benchmark tailored for evaluating the capabilities of language models to process, understand, and reason over super long contexts (100k+ tokens). Long contexts are crucial for enhancing applications with LLMs and achieving high-level interaction. InfiniteBench is designed to push the boundaries of language models by testing them against a context length of 100k+, which is 10 times longer than traditional datasets.

총 12개의 tasks 중에 En.MC 데이터

Task Name Context # Examples Avg Input Tokens Avg Output Tokens Description
En.MC Fake Book 229 184.4k 5.3 Multiple choice questions derived from the fake book.

image.png

InfiniteBench/En.MC 사용 방법

특정 코드

git clone <https://github.com/OpenBMB/InfiniteBench.git>
cd InfiniteBench
bash scripts/download_dataset.sh

다운로드

from datasets import load_dataset, Value, Sequence
ft = Features({"id": Value("int64"), "context": Value("string"), "input": Value("string"), "answer": Sequence(Value("string")), "options": Sequence(Value("string"))})
dataset = load_dataset("xinrongzhang2022/InfiniteBench", features=ft)