4 hours ago

Chaoyou Fu Haozhi Yuan Yuhao Dong Yi-Fan Zhang Yunhang Shen Xiaoxing Hu Xueying Li Jinsen Su Chengwu Long Xiaoyao Xie

Abstract

With the rapid advancement of video understanding, existing benchmarks are becoming increasingly saturated, exposing a critical discrepancy between inflated leaderboard scores and real-world model capabilities. To address this widening gap, we introduce Video-MME-v2, a comprehensive benchmark designed to rigorously evaluate the robustness and faithfulness of video understanding. To systematically evaluate model capabilities, we design a progressive tri-level hierarchy that incrementally increases the complexity of video comprehension, ranging from multi-point visual information aggregation, to temporal dynamics modeling, and ultimately to complex multimodal reasoning. Besides, in contrast to conventional per-question accuracy, we propose a group-based non-linear evaluation strategy that enforces both consistency across related queries and coherence in multi-step reasoning. It penalizes fragmented or guess-based correctness and assigns credit only to answers supported by valid reasoning. To guarantee data quality, Video-MME-v2 is constructed through a rigorously controlled human annotation pipeline, involving 12 annotators and 50 independent reviewers. Backed by 3,300 human-hours and up to 5 rounds of quality assurance, Video-MME-v2 aims to serve as one of the most authoritative video benchmarks. Extensive experiments reveal a substantial gap between current best model Gemini-3-Pro and human experts, and uncover a clear hierarchical bottleneck where errors in visual information aggregation and temporal modeling propagate to limit high-level reasoning. We further find that thinking-based reasoning is highly dependent on textual cues, improving performance with subtitles but sometimes degrading it in purely visual settings. By exposing these limitations, Video-MME-v2 establishes a demanding new testbed for the development of next-generation video MLLMs.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

4 hours ago

Video Understanding

Visual Question Answering

Multimodal

Chaoyou Fu Haozhi Yuan Yuhao Dong Yi-Fan Zhang Yunhang Shen Xiaoxing Hu Xueying Li Jinsen Su Chengwu Long Xiaoyao Xie

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

4 hours ago

Video Understanding

Visual Question Answering

Multimodal

Chaoyou Fu Haozhi Yuan Yuhao Dong Yi-Fan Zhang Yunhang Shen Xiaoxing Hu Xueying Li Jinsen Su Chengwu Long Xiaoyao Xie

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding

Chaoyou Fu Haozhi Yuan Yuhao Dong Yi-Fan Zhang Yunhang Shen Xiaoxing Hu Xueying Li Jinsen Su Chengwu Long Xiaoyao Xie9 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding

Chaoyou Fu Haozhi Yuan Yuhao Dong Yi-Fan Zhang Yunhang Shen Xiaoxing Hu Xueying Li Jinsen Su Chengwu Long Xiaoyao Xie9 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding

Chaoyou Fu Haozhi Yuan Yuhao Dong Yi-Fan Zhang Yunhang Shen Xiaoxing Hu Xueying Li Jinsen Su Chengwu Long Xiaoyao Xie9 more

Abstract

Build AI with AI

HyperAI Newsletters

Chaoyou Fu Haozhi Yuan Yuhao Dong Yi-Fan Zhang Yunhang Shen Xiaoxing Hu Xueying Li Jinsen Su Chengwu Long Xiaoyao Xie

Chaoyou Fu Haozhi Yuan Yuhao Dong Yi-Fan Zhang Yunhang Shen Xiaoxing Hu Xueying Li Jinsen Su Chengwu Long Xiaoyao Xie

Chaoyou Fu Haozhi Yuan Yuhao Dong Yi-Fan Zhang Yunhang Shen Xiaoxing Hu Xueying Li Jinsen Su Chengwu Long Xiaoyao Xie