Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

dLLM: Simple Diffusion Language Modeling

Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization































dLLM: Simple Diffusion Language Modeling

Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization






























Imagination Helps Visual Reasoning, But Not Yet in Latent Space
OmniGAIA: Towards Native Omni-Modal AI Agents
MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios
From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models
The Trinity of Consistency as a Defining Principle for General World Models
GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL
SkyReels-V4: Multi-modal Video-Audio Generation, Inpainting and Editing model
ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning
DreamID-Omni: Unified Framework for Controllable Human-Centric Audio-Video Generation
MolHIT: Advancing Molecular-Graph Generation with Hierarchical Discrete Diffusion Models
HyTRec: A Hybrid Temporal-Aware Attention Architecture for Long Behavior Sequential Recommendation
DREAM: Deep Research Evaluation with Agentic Metrics
LongCLI-Bench: A Preliminary Benchmark and Study for Long-horizon Agentic Programming in Command-Line Interfaces
PyVision-RL: Forging Open Agentic Vision Models via RL
From Perception to Action: An Interactive Benchmark for Vision Reasoning
Query-focused and Memory-aware Reranker for Long Context Processing
On Data Engineering for Scaling LLM Terminal Capabilities
DSDR: Dual-Scale Diversity Regularization for Exploration in LLM Reasoning
Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device
TOPReward: Token Probabilities as Hidden Zero-Shot Rewards for Robotics
ManCAR: Manifold-Constrained Latent Reasoning with Adaptive Test-Time Computation for Sequential Recommendation
VLANeXt: Recipes for Building Strong VLA Models
A Very Big Video Reasoning Suite
Selective Training for Large Vision Language Models via Visual Information Gain
DeepVision-103K: A Visually Diverse, Broad-Coverage, and Verifiable Mathematical Dataset for Multimodal Reasoning
SARAH: Spatially Aware Real-time Agentic Humans
EgoPush: Learning End-to-End Egocentric Multi-Object Rearrangement for Mobile Robots
Generated Reality: Human-centric World Simulation using Interactive Video Generation with Hand and Camera Control
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training
Arcee Trinity Large Technical Report
Imagination Helps Visual Reasoning, But Not Yet in Latent Space
OmniGAIA: Towards Native Omni-Modal AI Agents
MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios
From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models
The Trinity of Consistency as a Defining Principle for General World Models
GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL
SkyReels-V4: Multi-modal Video-Audio Generation, Inpainting and Editing model
ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning
DreamID-Omni: Unified Framework for Controllable Human-Centric Audio-Video Generation
MolHIT: Advancing Molecular-Graph Generation with Hierarchical Discrete Diffusion Models
HyTRec: A Hybrid Temporal-Aware Attention Architecture for Long Behavior Sequential Recommendation
DREAM: Deep Research Evaluation with Agentic Metrics
LongCLI-Bench: A Preliminary Benchmark and Study for Long-horizon Agentic Programming in Command-Line Interfaces
PyVision-RL: Forging Open Agentic Vision Models via RL
From Perception to Action: An Interactive Benchmark for Vision Reasoning
Query-focused and Memory-aware Reranker for Long Context Processing
On Data Engineering for Scaling LLM Terminal Capabilities
DSDR: Dual-Scale Diversity Regularization for Exploration in LLM Reasoning
Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device
TOPReward: Token Probabilities as Hidden Zero-Shot Rewards for Robotics
ManCAR: Manifold-Constrained Latent Reasoning with Adaptive Test-Time Computation for Sequential Recommendation
VLANeXt: Recipes for Building Strong VLA Models
A Very Big Video Reasoning Suite
Selective Training for Large Vision Language Models via Visual Information Gain
DeepVision-103K: A Visually Diverse, Broad-Coverage, and Verifiable Mathematical Dataset for Multimodal Reasoning
SARAH: Spatially Aware Real-time Agentic Humans
EgoPush: Learning End-to-End Egocentric Multi-Object Rearrangement for Mobile Robots
Generated Reality: Human-centric World Simulation using Interactive Video Generation with Hand and Camera Control
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training
Arcee Trinity Large Technical Report