Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Towards Pixel-Level VLM Perception via Simple Points Prediction

Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision































Towards Pixel-Level VLM Perception via Simple Points Prediction

Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision






























Innovator-VL: A Multimodal Large Language Model for Scientific Discovery
Advancing Open-source World Models
Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation
Short window attention enables long-term memorization
World Craft: Agentic Framework to Create Visualizable Worlds via Text
Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models
Masked Depth Modeling for Spatial Perception
A Pragmatic VLA Foundation Model
AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning
AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security
ARCEE TRINITY LARGE TECHNICAL REPORT
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability
ATLAS: Adaptive Transfer Scaling Laws for Multilingual Pretraining, Finetuning, and Decoding the Curse of Multilinguality
iFSQ: Improving FSQ for Image Generation with 1 Line of Code
Elastic Attention: Test-time Adaptive Sparsity Ratios for Efficient Transformers
Scientific Image Synthesis: Benchmarking, Methodologies, and Downstream Utility
The Script is All You Need: An Agentic Framework for Long-Horizon Dialogue-to-Cinematic Video Generation
daVinci-Dev: Agent-native Mid-training for Software Engineering
Can LLMs Clean Up Your Mess? A Survey of Application-Ready Data Preparation with LLMs
DeepSeek-OCR 2: Visual Causal Flow
Learning to Discover at Test Time
Eliciting Harmful Capabilities by Fine-Tuning On Safeguarded Outputs
Memory-V2V: Augmenting Video-to-Video Diffusion Models with Memory
Inference-Time Scaling of Verification: Self-Evolving Deep Research Agents via Test-Time Rubric-Guided Verification
VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents
TwinBrainVLA: Unleashing the Potential of Generalist VLMs for Embodied Tasks via Asymmetric Mixture-of-Transformers
SWE-Pruner: Self-Adaptive Context Pruning for Coding Agents
LongCat-Flash-Thinking-2601 Technical Report
Can Language Models Discover Scaling Laws?
Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning
Innovator-VL: A Multimodal Large Language Model for Scientific Discovery
Advancing Open-source World Models
Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation
Short window attention enables long-term memorization
World Craft: Agentic Framework to Create Visualizable Worlds via Text
Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models
Masked Depth Modeling for Spatial Perception
A Pragmatic VLA Foundation Model
AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning
AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security
ARCEE TRINITY LARGE TECHNICAL REPORT
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability
ATLAS: Adaptive Transfer Scaling Laws for Multilingual Pretraining, Finetuning, and Decoding the Curse of Multilinguality
iFSQ: Improving FSQ for Image Generation with 1 Line of Code
Elastic Attention: Test-time Adaptive Sparsity Ratios for Efficient Transformers
Scientific Image Synthesis: Benchmarking, Methodologies, and Downstream Utility
The Script is All You Need: An Agentic Framework for Long-Horizon Dialogue-to-Cinematic Video Generation
daVinci-Dev: Agent-native Mid-training for Software Engineering
Can LLMs Clean Up Your Mess? A Survey of Application-Ready Data Preparation with LLMs
DeepSeek-OCR 2: Visual Causal Flow
Learning to Discover at Test Time
Eliciting Harmful Capabilities by Fine-Tuning On Safeguarded Outputs
Memory-V2V: Augmenting Video-to-Video Diffusion Models with Memory
Inference-Time Scaling of Verification: Self-Evolving Deep Research Agents via Test-Time Rubric-Guided Verification
VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents
TwinBrainVLA: Unleashing the Potential of Generalist VLMs for Embodied Tasks via Asymmetric Mixture-of-Transformers
SWE-Pruner: Self-Adaptive Context Pruning for Coding Agents
LongCat-Flash-Thinking-2601 Technical Report
Can Language Models Discover Scaling Laws?
Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning