HyperAI

Main

GPU

Console
Docs
Pricing

Pulse

News

Resources

Papers
Notebooks
Datasets
Wiki

Benchmarks

SOTA
LLM Models
GPU Leaderboard

Community

Events

Utility

About Terms of Service Privacy Policy
English

Command Palette

Search for a command to run...

HyperAI
Papers

Papers

Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Build the Future of Artificial Intelligence

About

About Us Dataset Help

Products

News Papers Notebooks Datasets Wiki

Links

© HyperAI

GitHub Discord X (formerly Twitter)

HyperAI

Main

GPU

Console
Docs
Pricing

Pulse

News

Resources

Papers
Notebooks
Datasets
Wiki

Benchmarks

SOTA
LLM Models
GPU Leaderboard

Community

Events

Utility

About Terms of Service Privacy Policy
English

Command Palette

Search for a command to run...

HyperAI
Papers

Papers

Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Build the Future of Artificial Intelligence

About

About Us Dataset Help

Products

News Papers Notebooks Datasets Wiki

Links

© HyperAI

GitHub Discord X (formerly Twitter)

ACL-Verbatim: hallucination-free question answering for research

ACL-Verbatim: hallucination-free question answering for research

Retrieval-Augmented Generation

Intelligent Question Answering

Gábor Recski, Szilveszter Tóth, Nadia Verdha, et al.

Beyond Static Dialogues: Benchmarking Realistic, Heterogeneous, and Evolving Long-Term Memory

Beyond Static Dialogues: Benchmarking Realistic, Heterogeneous, and Evolving Long-Term Memory

Han Zhang, Zihao Tang, Xin Yu, et al.

The End of Software Engineering: How AI Agents Are Fundamentally Restructuring the Software Paradigm

Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention

Multi-Task Learning

Jing Huang, Daniel Wurgaft, Rachit Bansal, et al.

When Tools Fail: Benchmarking Dynamic Replanning and Anomaly Recovery in LLM Agents

Dongsheng Zhu, Xuchen Ma, Yucheng Shen, et al.

Direct 3D-Aware Object Insertion via Decomposed Visual Proxies

Diffusion Model

Image Generation

Jingbo Gong, Yikai Wang, Yushi Lan, et al.

AnchorWorld: Embodied Egocentric World Simulation with View-based Evolution Customization

Embodied Intelligence

Yu Li, Menghan Xia, Gongye Liu, et al.

SoCRATES: Towards Reliable Automated Evaluation of Proactive LLM Mediation across Domains and Socio-cognitive Variations

Taewon Yun, Hyeonseong Park, Jeonghwan Choi, et al.

MMAE: A Massive Multitask Audio Editing Benchmark

Audio and Speech Processing

Ziyang Ma, Ruiqi Yan, Ruiyang Xu, et al.

Your UnEmbedding Matrix is Secretly a Feature Lens for Text Embeddings

Songhao Wu, Zhongxin Chen, Yuxuan Liu, et al.

ChordEdit: One-Step Low-Energy Transport for Image Editing

Diffusion Model

Liangsi Lu, Xuhang Chen, Minzhe Guo, et al.

NitroGen: An Open Foundation Model for Generalist Gaming Agents

Loïc Magne, Anas Awadalla, Guanzhi Wang, et al.

Efficiently Reconstructing Dynamic Scenes One D4RT at a Time

Depth Estimation

3D Machine Vision

Chuhan Zhang, Guillaume Le Moing, Skanda Koppula, et al.

Continual Learning Bench: Evaluating Frontier AI Systems in Real-World Stateful Environments

Parth Asawa, Christopher M. Glaze, Gabriel Orlanski, et al.

MEMORY CACHING: RNNs with Growing Memory

Ali Behrouz, Zeman Li, Yuan Deng, et al.

RobotValues: Evaluating Household Robots When Human Values Conflict

Jongwook Han, Hyeongjin Kim, Yohan Jo

VideoKR: Towards Knowledge- and Reasoning-Intensive Video Understanding

Video Understanding

Visual Question Answering

Lin Fu, Zheyuan Yang, Yang Wang, et al.

AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints

Jiayu Liu, Cheng Qian, Zhenhailong Wang, et al.

TIDE: Proactive Multi-Problem Discovery via Template-Guided Iteration

Soyeong Jeong, Jinheon Baek, Minki Kang, et al.

ArcANE: Do Role-Playing Language Agents Stay in Character at the Right Time?

Woojung Song, Nalim Kim, Sangjun Song, et al.

Code2LoRA: Hypernetwork-Generated Adapters for Code Language Models under Software Evolution

Code Generation

Liliana Hotsko, Yinxi Li, Yuntian Deng, et al.

Self-Distilled Policy Gradient

Reinforcement Learning

Yifeng Liu, Shiyouan Zhang, Yifan Zhang, et al.

GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models

Iman Mirzadeh, Keivan Alizadeh, Oncel Tuzel, et al.

MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation

Huawei Lin, Peng Li, Jie Song, et al.

Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Akter et al., Xiao et al., Liu et al., et al.

Qwen-Image-Flash: Beyond Objective Design

Image Generation

Tianhe Wu, Kun Yan, Zikai Zhou, et al.

OVO-S-Bench: A Hierarchical Benchmark for Streaming Spatial Intelligence in Multimodal LLMs

Video Understanding

Yifei Li, Pengyiang Liu, Yuhang Zang, et al.

Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning

Reinforcement Learning

Xuekang Wang, Zhuoyuan Hao, Shuo Hou, et al.

Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories

Jiaming Wang, Ziteng Feng, Jiangtao Wu, et al.

Audio Interaction Model

Audio and Speech Processing

Zhifei Xie, Zihang Liu, Ze An, et al.

Cosmos 3: Omnimodal World Models for Physical AI

Aditi, Niket Agarwal, Arslan Ali, et al.

Learning, Fast and Slow: Towards LLMs That Adapt Continually

Supervised Fine-Tuning

Rishabh Tiwari, Kusha Sareen, Lakshya A Agrawal, et al.

ACL-Verbatim: hallucination-free question answering for research

ACL-Verbatim: hallucination-free question answering for research

Retrieval-Augmented Generation

Intelligent Question Answering

Gábor Recski, Szilveszter Tóth, Nadia Verdha, et al.

Beyond Static Dialogues: Benchmarking Realistic, Heterogeneous, and Evolving Long-Term Memory

Beyond Static Dialogues: Benchmarking Realistic, Heterogeneous, and Evolving Long-Term Memory

Han Zhang, Zihao Tang, Xin Yu, et al.

The End of Software Engineering: How AI Agents Are Fundamentally Restructuring the Software Paradigm

Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention

Multi-Task Learning

Jing Huang, Daniel Wurgaft, Rachit Bansal, et al.

When Tools Fail: Benchmarking Dynamic Replanning and Anomaly Recovery in LLM Agents

Dongsheng Zhu, Xuchen Ma, Yucheng Shen, et al.

Direct 3D-Aware Object Insertion via Decomposed Visual Proxies

Diffusion Model

Image Generation

Jingbo Gong, Yikai Wang, Yushi Lan, et al.

AnchorWorld: Embodied Egocentric World Simulation with View-based Evolution Customization

Embodied Intelligence

Yu Li, Menghan Xia, Gongye Liu, et al.

SoCRATES: Towards Reliable Automated Evaluation of Proactive LLM Mediation across Domains and Socio-cognitive Variations

Taewon Yun, Hyeonseong Park, Jeonghwan Choi, et al.

MMAE: A Massive Multitask Audio Editing Benchmark

Audio and Speech Processing

Ziyang Ma, Ruiqi Yan, Ruiyang Xu, et al.

Your UnEmbedding Matrix is Secretly a Feature Lens for Text Embeddings

Songhao Wu, Zhongxin Chen, Yuxuan Liu, et al.

ChordEdit: One-Step Low-Energy Transport for Image Editing

Diffusion Model

Liangsi Lu, Xuhang Chen, Minzhe Guo, et al.

NitroGen: An Open Foundation Model for Generalist Gaming Agents

Loïc Magne, Anas Awadalla, Guanzhi Wang, et al.

Efficiently Reconstructing Dynamic Scenes One D4RT at a Time

Depth Estimation

3D Machine Vision

Chuhan Zhang, Guillaume Le Moing, Skanda Koppula, et al.

Continual Learning Bench: Evaluating Frontier AI Systems in Real-World Stateful Environments

Parth Asawa, Christopher M. Glaze, Gabriel Orlanski, et al.

MEMORY CACHING: RNNs with Growing Memory

Ali Behrouz, Zeman Li, Yuan Deng, et al.

RobotValues: Evaluating Household Robots When Human Values Conflict

Jongwook Han, Hyeongjin Kim, Yohan Jo

VideoKR: Towards Knowledge- and Reasoning-Intensive Video Understanding

Video Understanding

Visual Question Answering

Lin Fu, Zheyuan Yang, Yang Wang, et al.

AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints

Jiayu Liu, Cheng Qian, Zhenhailong Wang, et al.

TIDE: Proactive Multi-Problem Discovery via Template-Guided Iteration

Soyeong Jeong, Jinheon Baek, Minki Kang, et al.

ArcANE: Do Role-Playing Language Agents Stay in Character at the Right Time?

Woojung Song, Nalim Kim, Sangjun Song, et al.

Code2LoRA: Hypernetwork-Generated Adapters for Code Language Models under Software Evolution

Code Generation

Liliana Hotsko, Yinxi Li, Yuntian Deng, et al.

Self-Distilled Policy Gradient

Reinforcement Learning

Yifeng Liu, Shiyouan Zhang, Yifan Zhang, et al.

GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models

Iman Mirzadeh, Keivan Alizadeh, Oncel Tuzel, et al.

MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation

Huawei Lin, Peng Li, Jie Song, et al.

Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Akter et al., Xiao et al., Liu et al., et al.

Qwen-Image-Flash: Beyond Objective Design

Image Generation

Tianhe Wu, Kun Yan, Zikai Zhou, et al.

OVO-S-Bench: A Hierarchical Benchmark for Streaming Spatial Intelligence in Multimodal LLMs

Video Understanding

Yifei Li, Pengyiang Liu, Yuhang Zang, et al.

Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning

Reinforcement Learning

Xuekang Wang, Zhuoyuan Hao, Shuo Hou, et al.

Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories

Jiaming Wang, Ziteng Feng, Jiangtao Wu, et al.

Audio Interaction Model

Audio and Speech Processing

Zhifei Xie, Zihang Liu, Ze An, et al.

Cosmos 3: Omnimodal World Models for Physical AI

Aditi, Niket Agarwal, Arslan Ali, et al.

Learning, Fast and Slow: Towards LLMs That Adapt Continually

Supervised Fine-Tuning

Rishabh Tiwari, Kusha Sareen, Lakshya A Agrawal, et al.

The End of Software Engineering: How AI Agents Are Fundamentally Restructuring the Software Paradigm

Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention

When Tools Fail: Benchmarking Dynamic Replanning and Anomaly Recovery in LLM Agents

Direct 3D-Aware Object Insertion via Decomposed Visual Proxies

AnchorWorld: Embodied Egocentric World Simulation with View-based Evolution Customization

SoCRATES: Towards Reliable Automated Evaluation of Proactive LLM Mediation across Domains and Socio-cognitive Variations

MMAE: A Massive Multitask Audio Editing Benchmark

Your UnEmbedding Matrix is Secretly a Feature Lens for Text Embeddings

ChordEdit: One-Step Low-Energy Transport for Image Editing

NitroGen: An Open Foundation Model for Generalist Gaming Agents

Efficiently Reconstructing Dynamic Scenes One D4RT at a Time

Continual Learning Bench: Evaluating Frontier AI Systems in Real-World Stateful Environments

MEMORY CACHING: RNNs with Growing Memory

RobotValues: Evaluating Household Robots When Human Values Conflict

VideoKR: Towards Knowledge- and Reasoning-Intensive Video Understanding

AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints

TIDE: Proactive Multi-Problem Discovery via Template-Guided Iteration

ArcANE: Do Role-Playing Language Agents Stay in Character at the Right Time?

Code2LoRA: Hypernetwork-Generated Adapters for Code Language Models under Software Evolution

Self-Distilled Policy Gradient

GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models

MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation

Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Qwen-Image-Flash: Beyond Objective Design

OVO-S-Bench: A Hierarchical Benchmark for Streaming Spatial Intelligence in Multimodal LLMs

Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning

Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories

Audio Interaction Model

Cosmos 3: Omnimodal World Models for Physical AI

Learning, Fast and Slow: Towards LLMs That Adapt Continually

The End of Software Engineering: How AI Agents Are Fundamentally Restructuring the Software Paradigm

Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention

When Tools Fail: Benchmarking Dynamic Replanning and Anomaly Recovery in LLM Agents

Direct 3D-Aware Object Insertion via Decomposed Visual Proxies

AnchorWorld: Embodied Egocentric World Simulation with View-based Evolution Customization

SoCRATES: Towards Reliable Automated Evaluation of Proactive LLM Mediation across Domains and Socio-cognitive Variations

MMAE: A Massive Multitask Audio Editing Benchmark

Your UnEmbedding Matrix is Secretly a Feature Lens for Text Embeddings

ChordEdit: One-Step Low-Energy Transport for Image Editing

NitroGen: An Open Foundation Model for Generalist Gaming Agents

Efficiently Reconstructing Dynamic Scenes One D4RT at a Time

Continual Learning Bench: Evaluating Frontier AI Systems in Real-World Stateful Environments

MEMORY CACHING: RNNs with Growing Memory

RobotValues: Evaluating Household Robots When Human Values Conflict

VideoKR: Towards Knowledge- and Reasoning-Intensive Video Understanding

AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints

TIDE: Proactive Multi-Problem Discovery via Template-Guided Iteration

ArcANE: Do Role-Playing Language Agents Stay in Character at the Right Time?

Code2LoRA: Hypernetwork-Generated Adapters for Code Language Models under Software Evolution

Self-Distilled Policy Gradient

GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models

MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation

Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Qwen-Image-Flash: Beyond Objective Design

OVO-S-Bench: A Hierarchical Benchmark for Streaming Spatial Intelligence in Multimodal LLMs

Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning

Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories

Audio Interaction Model

Cosmos 3: Omnimodal World Models for Physical AI

Learning, Fast and Slow: Towards LLMs That Adapt Continually