Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

ACL-Verbatim: hallucination-free question answering for research

Beyond Static Dialogues: Benchmarking Realistic, Heterogeneous, and Evolving Long-Term Memory































ACL-Verbatim: hallucination-free question answering for research

Beyond Static Dialogues: Benchmarking Realistic, Heterogeneous, and Evolving Long-Term Memory






























The End of Software Engineering: How AI Agents Are Fundamentally Restructuring the Software Paradigm
Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention
When Tools Fail: Benchmarking Dynamic Replanning and Anomaly Recovery in LLM Agents
Direct 3D-Aware Object Insertion via Decomposed Visual Proxies
AnchorWorld: Embodied Egocentric World Simulation with View-based Evolution Customization
SoCRATES: Towards Reliable Automated Evaluation of Proactive LLM Mediation across Domains and Socio-cognitive Variations
MMAE: A Massive Multitask Audio Editing Benchmark
Your UnEmbedding Matrix is Secretly a Feature Lens for Text Embeddings
ChordEdit: One-Step Low-Energy Transport for Image Editing
NitroGen: An Open Foundation Model for Generalist Gaming Agents
Efficiently Reconstructing Dynamic Scenes One D4RT at a Time
Continual Learning Bench: Evaluating Frontier AI Systems in Real-World Stateful Environments
MEMORY CACHING: RNNs with Growing Memory
RobotValues: Evaluating Household Robots When Human Values Conflict
VideoKR: Towards Knowledge- and Reasoning-Intensive Video Understanding
AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints
TIDE: Proactive Multi-Problem Discovery via Template-Guided Iteration
ArcANE: Do Role-Playing Language Agents Stay in Character at the Right Time?
Code2LoRA: Hypernetwork-Generated Adapters for Code Language Models under Software Evolution
Self-Distilled Policy Gradient
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation
Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning
Qwen-Image-Flash: Beyond Objective Design
OVO-S-Bench: A Hierarchical Benchmark for Streaming Spatial Intelligence in Multimodal LLMs
Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning
Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories
Audio Interaction Model
Cosmos 3: Omnimodal World Models for Physical AI
Learning, Fast and Slow: Towards LLMs That Adapt Continually
The End of Software Engineering: How AI Agents Are Fundamentally Restructuring the Software Paradigm
Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention
When Tools Fail: Benchmarking Dynamic Replanning and Anomaly Recovery in LLM Agents
Direct 3D-Aware Object Insertion via Decomposed Visual Proxies
AnchorWorld: Embodied Egocentric World Simulation with View-based Evolution Customization
SoCRATES: Towards Reliable Automated Evaluation of Proactive LLM Mediation across Domains and Socio-cognitive Variations
MMAE: A Massive Multitask Audio Editing Benchmark
Your UnEmbedding Matrix is Secretly a Feature Lens for Text Embeddings
ChordEdit: One-Step Low-Energy Transport for Image Editing
NitroGen: An Open Foundation Model for Generalist Gaming Agents
Efficiently Reconstructing Dynamic Scenes One D4RT at a Time
Continual Learning Bench: Evaluating Frontier AI Systems in Real-World Stateful Environments
MEMORY CACHING: RNNs with Growing Memory
RobotValues: Evaluating Household Robots When Human Values Conflict
VideoKR: Towards Knowledge- and Reasoning-Intensive Video Understanding
AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints
TIDE: Proactive Multi-Problem Discovery via Template-Guided Iteration
ArcANE: Do Role-Playing Language Agents Stay in Character at the Right Time?
Code2LoRA: Hypernetwork-Generated Adapters for Code Language Models under Software Evolution
Self-Distilled Policy Gradient
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation
Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning
Qwen-Image-Flash: Beyond Objective Design
OVO-S-Bench: A Hierarchical Benchmark for Streaming Spatial Intelligence in Multimodal LLMs
Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning
Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories
Audio Interaction Model
Cosmos 3: Omnimodal World Models for Physical AI
Learning, Fast and Slow: Towards LLMs That Adapt Continually