HyperAI

Main

GPU

Console
Docs
Pricing

Pulse

News

Resources

Papers
Notebooks
Datasets
Wiki

Benchmarks

SOTA
LLM Models
GPU Leaderboard

Community

Events

Utility

About Terms of Service Privacy Policy
English

Command Palette

Search for a command to run...

HyperAI
Papers

Papers

Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Build the Future of Artificial Intelligence

About

About Us Dataset Help

Products

News Papers Notebooks Datasets Wiki

Links

© HyperAI

GitHub Discord X (formerly Twitter)

HyperAI

Main

GPU

Console
Docs
Pricing

Pulse

News

Resources

Papers
Notebooks
Datasets
Wiki

Benchmarks

SOTA
LLM Models
GPU Leaderboard

Community

Events

Utility

About Terms of Service Privacy Policy
English

Command Palette

Search for a command to run...

HyperAI
Papers

Papers

Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Build the Future of Artificial Intelligence

About

About Us Dataset Help

Products

News Papers Notebooks Datasets Wiki

Links

© HyperAI

GitHub Discord X (formerly Twitter)

AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security

AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security

Dongrui Liu, Yu Li, Zhonghao Yang, et al.

World Action Models: The Next Frontier in Embodied AI

World Action Models: The Next Frontier in Embodied AI

Embodied Intelligence

Siyin Wang, Junhao Shi, Zhaoyang Fu, et al.

World Action Models are Zero-shot Policies

Diffusion Model

Video Generation

Seonghyeon Ye, Yunhao Ge, Kaiyuan Zheng, et al.

ResearchMath-14K: Scaling Research-Level Mathematics via Agents

Guijin Son, Seungyeop Yi, Minju Gwak, et al.

Self-Improving Language Models with Bidirectional Evolutionary Search

Guowei Xu, Zhenting Qi, Huangyuan Su, et al.

From Pixels to Words -- Towards Native One-Vision Models at Scale

Video Understanding

Haiwen Diao, Jiahao Wang, Penghao Wu, et al.

Agent Explorative Policy Optimization for Multimodal Agentic Reasoning

Minki Kang, Shizhe Diao, Ryo Hachiuma, et al.

ProRL: Effective Reinforcement Learning for Proactive Recommendation via Rectified Policy Gradient Estimation

Reinforcement Learning

Preference Modeling

Hongru Hou, Tiehua Mei, Denghui Geng, et al.

Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players

Video Generation

Diffusion Model

Fangfu Liu, Kai He, Tianchang Shen, et al.

AutoFigure: Generating and Refining Publication-Ready Scientific Illustrations

Minjun Zhu, Zhen Lin, Yixuan Weng, et al.

AutoResearch AI: Towards AI-Powered Research Automation for Scientific Discovery

Embodied Intelligence

Guiyao Tie, Jiawen Shi, Dingjie Song, et al.

Agent Harness Engineering: A Survey

Junjie Li, Xi Xiao, Yunbei Zhang, et al.

D^2-Monitor: Dynamic Safety Monitoring for Diffusion LLMs via Hesitation-Aware Routing

Diffusion Model

Aoxi Liu, Yupeng Chen, James Oldfield, et al.

Geometry-Aware Representation Denoising for Robust Multi-view 3D Reconstruction

Diffusion Model

3D Machine Vision

Jin Hyeon Kim, Jaeeun Lee, Claire Kim, et al.

EvalVerse: Pipeline-Aware and Expert-Calibrated Benchmarking for Professional Cinematic Video Generation

Video Generation

Songlin Yang, Haobin Zhong, Ruilin Zhang, et al.

MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research

Reinforcement Learning

Dingbang Wu, Rui Hao, Haiyang Wang, et al.

SpatialBench: Is Your Spatial Foundation Model an All-Round Player?

Haosong Peng, Hao Li, Jiaqi Chen, et al.

LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding

Object Detection

Shihao Wang, Shilong Liu, Yuanguo Kuang, et al.

Gemini Embedding 2: A Native Multimodal Embedding Model from Gemini

Multimodal Representation

Madhuri Shanbhogue, Zhe Li, Shanfeng Zhang, et al.

Language Models Need Sleep

Sangyun Lee, Sean McLeish, Tom Goldstein, et al.

ECHO: Terminal Agents Learn World Models for Free

Reinforcement Learning

Vaishnavi Shrivastava, Piero Kauffmann, Ahmed Awadallah, et al.

ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning

Video Understanding

Zuhao Yang, Kaichen Zhang, Sudong Wang, et al.

TriSplat: Simulation-Ready Feed-Forward 3D Scene Reconstruction

Computer Vision

Weijie Wang, Zimu Li, Jinchuan Shi, et al.

Foundation Protocol: A Coordination Layer for Agentic Society

Bang Liu, Yongfeng Gu, Jiayi Zhang, et al.

WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation

Video Generation

Kaining Ying, Hengrui Hu, Siyu Ren, et al.

Macaron-A2UI: A Model for Generative UI in Personal Agents

Fancy Kong, Congjie Zheng, Murphy Zhuang, et al.

DVAO: Dynamic Variance-adaptive Advantage Optimization for Multi-reward Reinforcement Learning

Reinforcement Learning

Multi-Task Learning

Guochao Jiang, Jingyi Song, Guofeng Quan, et al.

ViMU: Benchmarking Video Metaphorical Understanding

Video Understanding

Emotion Recognition

Qi Li, Xinchao Wang

SMOL: Professionally translated parallel data for 115 under-represented languages

Supervised Fine-Tuning

Isaac Caswell, Elizabeth Nielsen, Jiaming Luo, et al.

Chi-Bench: Can AI Agents Automate End-to-End, Long-Horizon, Policy-Rich Healthcare Workflows?

Haolin Chen, Deon Metelski, Leon Qi, et al.

Combining On-Policy Optimization and Distillation for Long-Context Reasoning in Large Language Models

Miguel Moura Ramos, Duarte M. Alves, André F. T. Martins

Through the Lens of Contrast: Self-Improving Visual Reasoning in VLMs

Visual Question Answering

Zhiyu Pan, Yizheng Wu, Jiashen Hua, et al.

AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security

AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security

Dongrui Liu, Yu Li, Zhonghao Yang, et al.

World Action Models: The Next Frontier in Embodied AI

World Action Models: The Next Frontier in Embodied AI

Embodied Intelligence

Siyin Wang, Junhao Shi, Zhaoyang Fu, et al.

World Action Models are Zero-shot Policies

Diffusion Model

Video Generation

Seonghyeon Ye, Yunhao Ge, Kaiyuan Zheng, et al.

ResearchMath-14K: Scaling Research-Level Mathematics via Agents

Guijin Son, Seungyeop Yi, Minju Gwak, et al.

Self-Improving Language Models with Bidirectional Evolutionary Search

Guowei Xu, Zhenting Qi, Huangyuan Su, et al.

From Pixels to Words -- Towards Native One-Vision Models at Scale

Video Understanding

Haiwen Diao, Jiahao Wang, Penghao Wu, et al.

Agent Explorative Policy Optimization for Multimodal Agentic Reasoning

Minki Kang, Shizhe Diao, Ryo Hachiuma, et al.

ProRL: Effective Reinforcement Learning for Proactive Recommendation via Rectified Policy Gradient Estimation

Reinforcement Learning

Preference Modeling

Hongru Hou, Tiehua Mei, Denghui Geng, et al.

Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players

Video Generation

Diffusion Model

Fangfu Liu, Kai He, Tianchang Shen, et al.

AutoFigure: Generating and Refining Publication-Ready Scientific Illustrations

Minjun Zhu, Zhen Lin, Yixuan Weng, et al.

AutoResearch AI: Towards AI-Powered Research Automation for Scientific Discovery

Embodied Intelligence

Guiyao Tie, Jiawen Shi, Dingjie Song, et al.

Agent Harness Engineering: A Survey

Junjie Li, Xi Xiao, Yunbei Zhang, et al.

D^2-Monitor: Dynamic Safety Monitoring for Diffusion LLMs via Hesitation-Aware Routing

Diffusion Model

Aoxi Liu, Yupeng Chen, James Oldfield, et al.

Geometry-Aware Representation Denoising for Robust Multi-view 3D Reconstruction

Diffusion Model

3D Machine Vision

Jin Hyeon Kim, Jaeeun Lee, Claire Kim, et al.

EvalVerse: Pipeline-Aware and Expert-Calibrated Benchmarking for Professional Cinematic Video Generation

Video Generation

Songlin Yang, Haobin Zhong, Ruilin Zhang, et al.

MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research

Reinforcement Learning

Dingbang Wu, Rui Hao, Haiyang Wang, et al.

SpatialBench: Is Your Spatial Foundation Model an All-Round Player?

Haosong Peng, Hao Li, Jiaqi Chen, et al.

LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding

Object Detection

Shihao Wang, Shilong Liu, Yuanguo Kuang, et al.

Gemini Embedding 2: A Native Multimodal Embedding Model from Gemini

Multimodal Representation

Madhuri Shanbhogue, Zhe Li, Shanfeng Zhang, et al.

Language Models Need Sleep

Sangyun Lee, Sean McLeish, Tom Goldstein, et al.

ECHO: Terminal Agents Learn World Models for Free

Reinforcement Learning

Vaishnavi Shrivastava, Piero Kauffmann, Ahmed Awadallah, et al.

ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning

Video Understanding

Zuhao Yang, Kaichen Zhang, Sudong Wang, et al.

TriSplat: Simulation-Ready Feed-Forward 3D Scene Reconstruction

Computer Vision

Weijie Wang, Zimu Li, Jinchuan Shi, et al.

Foundation Protocol: A Coordination Layer for Agentic Society

Bang Liu, Yongfeng Gu, Jiayi Zhang, et al.

WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation

Video Generation

Kaining Ying, Hengrui Hu, Siyu Ren, et al.

Macaron-A2UI: A Model for Generative UI in Personal Agents

Fancy Kong, Congjie Zheng, Murphy Zhuang, et al.

DVAO: Dynamic Variance-adaptive Advantage Optimization for Multi-reward Reinforcement Learning

Reinforcement Learning

Multi-Task Learning

Guochao Jiang, Jingyi Song, Guofeng Quan, et al.

ViMU: Benchmarking Video Metaphorical Understanding

Video Understanding

Emotion Recognition

Qi Li, Xinchao Wang

SMOL: Professionally translated parallel data for 115 under-represented languages

Supervised Fine-Tuning

Isaac Caswell, Elizabeth Nielsen, Jiaming Luo, et al.

Chi-Bench: Can AI Agents Automate End-to-End, Long-Horizon, Policy-Rich Healthcare Workflows?

Haolin Chen, Deon Metelski, Leon Qi, et al.

Combining On-Policy Optimization and Distillation for Long-Context Reasoning in Large Language Models

Miguel Moura Ramos, Duarte M. Alves, André F. T. Martins

Through the Lens of Contrast: Self-Improving Visual Reasoning in VLMs

Visual Question Answering

Zhiyu Pan, Yizheng Wu, Jiashen Hua, et al.

World Action Models are Zero-shot Policies

ResearchMath-14K: Scaling Research-Level Mathematics via Agents

Self-Improving Language Models with Bidirectional Evolutionary Search

From Pixels to Words -- Towards Native One-Vision Models at Scale

Agent Explorative Policy Optimization for Multimodal Agentic Reasoning

ProRL: Effective Reinforcement Learning for Proactive Recommendation via Rectified Policy Gradient Estimation

Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players

AutoFigure: Generating and Refining Publication-Ready Scientific Illustrations

AutoResearch AI: Towards AI-Powered Research Automation for Scientific Discovery

Agent Harness Engineering: A Survey

D^2-Monitor: Dynamic Safety Monitoring for Diffusion LLMs via Hesitation-Aware Routing

Geometry-Aware Representation Denoising for Robust Multi-view 3D Reconstruction

EvalVerse: Pipeline-Aware and Expert-Calibrated Benchmarking for Professional Cinematic Video Generation

MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research

SpatialBench: Is Your Spatial Foundation Model an All-Round Player?

LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding

Gemini Embedding 2: A Native Multimodal Embedding Model from Gemini

Language Models Need Sleep

ECHO: Terminal Agents Learn World Models for Free

ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning

TriSplat: Simulation-Ready Feed-Forward 3D Scene Reconstruction

Foundation Protocol: A Coordination Layer for Agentic Society

WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation

Macaron-A2UI: A Model for Generative UI in Personal Agents

DVAO: Dynamic Variance-adaptive Advantage Optimization for Multi-reward Reinforcement Learning

ViMU: Benchmarking Video Metaphorical Understanding

SMOL: Professionally translated parallel data for 115 under-represented languages

Chi-Bench: Can AI Agents Automate End-to-End, Long-Horizon, Policy-Rich Healthcare Workflows?

Combining On-Policy Optimization and Distillation for Long-Context Reasoning in Large Language Models

Through the Lens of Contrast: Self-Improving Visual Reasoning in VLMs

World Action Models are Zero-shot Policies

ResearchMath-14K: Scaling Research-Level Mathematics via Agents

Self-Improving Language Models with Bidirectional Evolutionary Search

From Pixels to Words -- Towards Native One-Vision Models at Scale

Agent Explorative Policy Optimization for Multimodal Agentic Reasoning

ProRL: Effective Reinforcement Learning for Proactive Recommendation via Rectified Policy Gradient Estimation

Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players

AutoFigure: Generating and Refining Publication-Ready Scientific Illustrations

AutoResearch AI: Towards AI-Powered Research Automation for Scientific Discovery

Agent Harness Engineering: A Survey

D^2-Monitor: Dynamic Safety Monitoring for Diffusion LLMs via Hesitation-Aware Routing

Geometry-Aware Representation Denoising for Robust Multi-view 3D Reconstruction

EvalVerse: Pipeline-Aware and Expert-Calibrated Benchmarking for Professional Cinematic Video Generation

MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research

SpatialBench: Is Your Spatial Foundation Model an All-Round Player?

LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding

Gemini Embedding 2: A Native Multimodal Embedding Model from Gemini

Language Models Need Sleep

ECHO: Terminal Agents Learn World Models for Free

ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning

TriSplat: Simulation-Ready Feed-Forward 3D Scene Reconstruction

Foundation Protocol: A Coordination Layer for Agentic Society

WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation

Macaron-A2UI: A Model for Generative UI in Personal Agents

DVAO: Dynamic Variance-adaptive Advantage Optimization for Multi-reward Reinforcement Learning

ViMU: Benchmarking Video Metaphorical Understanding

SMOL: Professionally translated parallel data for 115 under-represented languages

Chi-Bench: Can AI Agents Automate End-to-End, Long-Horizon, Policy-Rich Healthcare Workflows?

Combining On-Policy Optimization and Distillation for Long-Context Reasoning in Large Language Models

Through the Lens of Contrast: Self-Improving Visual Reasoning in VLMs