UNO-Bench full-modal Evaluation Benchmark Dataset
Date
Size
Paper URL
License
MIT
UNO-Bench is the first unified full-modal evaluation benchmark released by Meituan's LongCat team in 2025. The related paper is titled "UNO-Bench: A Unified Benchmark for Exploring the Compositional Law Between Uni-modal and Omni-modal in Omni ModelsThe goal is to efficiently assess single-modal and multi-modal understanding capabilities.
This dataset contains 1250 full-modal samples with 98% cross-modal solvability and 2480 single-modal samples, covering 44 task types and 5 modality combinations. The dataset also includes a general scoring model that supports automated evaluation of 6 question types, providing a unified evaluation standard for multimodal tasks. The full-modal samples were carefully constructed by humans to closely resemble real-world applications, especially suitable for the Chinese context; the single-modal samples supplement the basic cognitive and ability dimensions, making the overall evaluation more comprehensive.
Data Structures:
The data is stored in Parquet format, and each sample contains structured fields:
- qid (sample ID), subset_name (subset name);
- question (textual question) and answer (standard answer);
- images / audios / videos (multimodal content, file paths are stored as a dictionary, null if not present);
- task (44 task tags), ability (ability type), source (data source), score_type (scoring method).

Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.