Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning

OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language Models































VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning

OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language Models






























Where Vision Becomes Text: Locating the OCR Routing Bottleneck in Vision-Language Models
OCR or Not? Rethinking Document Information Extraction in the MLLMs Era with Real-World Large-Scale Datasets
dnaHNet: A Scalable and Hierarchical Foundation Model for Genomic Sequence Learning
Neural Computers
ASGuard: Activation-Scaling Guard to Mitigate Targeted Jailbreaking Attack
GlobalSplat: Efficient Feed-Forward 3D Gaussian Splatting via Global Scene Tokens
How to Fine-Tune a Reasoning Model? A Teacher-Student Cooperation Framework to Synthesize Student-Consistent SFT Data
RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework
DR3-Eval: Towards Realistic and Reproducible Deep Research Evaluation
HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds
pi0.7: a Steerable Generalist Robotic Foundation Model with Emergent Capabilities
GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training
Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward
Theory of Space: Can Foundation Models Construct Spatial Beliefs through Active Exploration?
Memory Transfer Learning: How Memories are Transferred Across Domains in Coding Agents
OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language World Models
SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments
RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time
Seedance 2.0: Advancing Video Generation for World Complexity
GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents
Cross-Scale Pansharpening via ScaleFormer and the PanScale Benchmark
ParseBench: A Document Parsing Benchmark for AI Agents
Memory Intelligence Agent
PROPELLA-1: MULTI-PROPERTY DOCUMENT ANNOTATION FOR LLM DATA CURATION AT SCALE
Internalized Reasoning for Long-Context Visual Document Understanding
TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate
BERT-as-a-Judge: A Robust Alternative to Lexical Methods for Efficient Reference-Based LLM Evaluation
SPPO: Sequence-Level PPO for Long-Horizon Reasoning Tasks
Turing Test on Screen: A Benchmark for Mobile GUI Agent Humanization
Audio-Omni: Extending Multi-modal Understanding to Versatile Audio Generation and Editing
Where Vision Becomes Text: Locating the OCR Routing Bottleneck in Vision-Language Models
OCR or Not? Rethinking Document Information Extraction in the MLLMs Era with Real-World Large-Scale Datasets
dnaHNet: A Scalable and Hierarchical Foundation Model for Genomic Sequence Learning
Neural Computers
ASGuard: Activation-Scaling Guard to Mitigate Targeted Jailbreaking Attack
GlobalSplat: Efficient Feed-Forward 3D Gaussian Splatting via Global Scene Tokens
How to Fine-Tune a Reasoning Model? A Teacher-Student Cooperation Framework to Synthesize Student-Consistent SFT Data
RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework
DR3-Eval: Towards Realistic and Reproducible Deep Research Evaluation
HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds
pi0.7: a Steerable Generalist Robotic Foundation Model with Emergent Capabilities
GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training
Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward
Theory of Space: Can Foundation Models Construct Spatial Beliefs through Active Exploration?
Memory Transfer Learning: How Memories are Transferred Across Domains in Coding Agents
OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language World Models
SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments
RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time
Seedance 2.0: Advancing Video Generation for World Complexity
GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents
Cross-Scale Pansharpening via ScaleFormer and the PanScale Benchmark
ParseBench: A Document Parsing Benchmark for AI Agents
Memory Intelligence Agent
PROPELLA-1: MULTI-PROPERTY DOCUMENT ANNOTATION FOR LLM DATA CURATION AT SCALE
Internalized Reasoning for Long-Context Visual Document Understanding
TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate
BERT-as-a-Judge: A Robust Alternative to Lexical Methods for Efficient Reference-Based LLM Evaluation
SPPO: Sequence-Level PPO for Long-Horizon Reasoning Tasks
Turing Test on Screen: A Benchmark for Mobile GUI Agent Humanization
Audio-Omni: Extending Multi-modal Understanding to Versatile Audio Generation and Editing