Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Task Tokens: A Flexible Approach to Adapting Behavior Foundation Models

面向盲人与低视力用户的可解释人工智能:Agent 时代的信任、模态与可解释性探索































Task Tokens: A Flexible Approach to Adapting Behavior Foundation Models

面向盲人与低视力用户的可解释人工智能:Agent 时代的信任、模态与可解释性探索






























PlayCoder: Making LLM-Generated GUI Code Playable
TEMPO: Scaling Test-time Training for Large Reasoning Models
AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Model
AgentSPEX: An Agent SPecification and EXecution Language
CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation
Tstars-Tryon 1.0: Robust and Realistic Virtual Try-On for Diverse Fashion Items
Fast NF4 Dequantization Kernels for Large Language Model Inference
EasyVideoR1: Easier RL for Video Understanding
MultiWorld: Scalable Multi-Agent Multi-View Video World Models
OpenGame: Open Agentic Coding for Games
Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence
OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation
Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation
ScribblePrompt: Fast and Flexible Interactive Segmentation for Any Biomedical Image
Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy
UI-TARS: Pioneering Automated GUI Interaction with Native Agents
HunyuanVideo: A Systematic Framework for Large Video Generative Models
MathNet: A Global Multimodal Benchmark for Mathematical Reasoning and Retrieval
Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering
Active Context Compression: Autonomous Memory Management in LLM Agents
Cut Your Losses! Learning to Prune Paths Early for Efficient Parallel Reasoning
Qwen3.5-Omni Technical Report
Web Retrieval-Aware Chunking (W-RAC) for Efficient and Cost-Effective Retrieval-Augmented Generation Systems
PersonaVLM: Long-Term Personalized Multimodal LLMs
Maximal Brain Damage Without Data or Optimization: Disrupting Neural Networks via Sign-Bit Flips
Elucidating the SNR-t Bias of Diffusion Probabilistic Models
Multimodal OCR: Parse Anything from Documents
Granite-speech: open-source speech-aware LLMs with strong English ASR capabilities
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis
Video Object and Interaction Deletion
PlayCoder: Making LLM-Generated GUI Code Playable
TEMPO: Scaling Test-time Training for Large Reasoning Models
AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Model
AgentSPEX: An Agent SPecification and EXecution Language
CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation
Tstars-Tryon 1.0: Robust and Realistic Virtual Try-On for Diverse Fashion Items
Fast NF4 Dequantization Kernels for Large Language Model Inference
EasyVideoR1: Easier RL for Video Understanding
MultiWorld: Scalable Multi-Agent Multi-View Video World Models
OpenGame: Open Agentic Coding for Games
Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence
OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation
Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation
ScribblePrompt: Fast and Flexible Interactive Segmentation for Any Biomedical Image
Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy
UI-TARS: Pioneering Automated GUI Interaction with Native Agents
HunyuanVideo: A Systematic Framework for Large Video Generative Models
MathNet: A Global Multimodal Benchmark for Mathematical Reasoning and Retrieval
Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering
Active Context Compression: Autonomous Memory Management in LLM Agents
Cut Your Losses! Learning to Prune Paths Early for Efficient Parallel Reasoning
Qwen3.5-Omni Technical Report
Web Retrieval-Aware Chunking (W-RAC) for Efficient and Cost-Effective Retrieval-Augmented Generation Systems
PersonaVLM: Long-Term Personalized Multimodal LLMs
Maximal Brain Damage Without Data or Optimization: Disrupting Neural Networks via Sign-Bit Flips
Elucidating the SNR-t Bias of Diffusion Probabilistic Models
Multimodal OCR: Parse Anything from Documents
Granite-speech: open-source speech-aware LLMs with strong English ASR capabilities
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis
Video Object and Interaction Deletion