Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Qwen3-VL Technical Report

G2VLM: Geometry Grounded Vision Language Model with Unified 3D Reconstruction and Spatial Reasoning































Qwen3-VL Technical Report

G2VLM: Geometry Grounded Vision Language Model with Unified 3D Reconstruction and Spatial Reasoning






























Multi-Crit: Benchmarking Multimodal Judges on Pluralistic Criteria-Following
MIRA: Multimodal Iterative Reasoning Agent for Image Editing
ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction
Canvas-to-Image: Compositional Image Generation with Multimodal Controls
Video Generation Models Are Good Latent Reward Models
DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration
Think Visually, Reason Textually: Vision-Language Synergy in ARC
Harmony: Harmonizing Audio and Video Generation through Cross-Task Synergy
Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation
Latent Collaboration in Multi-Agent Systems
Multimodal Evaluation of Russian-language Architectures
ROOT: Robust Orthogonalized Optimizer for Neural Network Training
Superposition Yields Robust Neural Scaling
Optimal Mistake Bounds for Transductive Online Learning
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Why Diffusion Models Don’t Memorize: The Role of Implicit Dynamical Regularization in Training
1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)
Evolution Strategies at the Hyperscale
Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward
iMontage: Unified, Versatile, Highly Dynamic Many-to-many Image Generation
Agent0-VL: Exploring Self-Evolving Agent for Tool-Integrated Vision-Language Reasoning
MedSAM3: Delving into Segment Anything with Medical Concepts
SteadyDancer: Harmonized and Coherent Human Image Animation with First-Frame Preservation
GigaEvo: An Open Source Optimization Framework Powered By LLMs And Evolution Algorithms
Fidelity-Aware Recommendation Explanations via Stochastic Path Integration
Extracting Interaction-Aware Monosemantic Concepts in Recommender Systems
MSRNet: A Multi-Scale Recursive Network for Camouflaged Object Detection
Multi-Crit: Benchmarking Multimodal Judges on Pluralistic Criteria-Following
MIRA: Multimodal Iterative Reasoning Agent for Image Editing
ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction
Canvas-to-Image: Compositional Image Generation with Multimodal Controls
Video Generation Models Are Good Latent Reward Models
DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration
Think Visually, Reason Textually: Vision-Language Synergy in ARC
Harmony: Harmonizing Audio and Video Generation through Cross-Task Synergy
Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation
Latent Collaboration in Multi-Agent Systems
Multimodal Evaluation of Russian-language Architectures
ROOT: Robust Orthogonalized Optimizer for Neural Network Training
Superposition Yields Robust Neural Scaling
Optimal Mistake Bounds for Transductive Online Learning
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Why Diffusion Models Don’t Memorize: The Role of Implicit Dynamical Regularization in Training
1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)
Evolution Strategies at the Hyperscale
Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward
iMontage: Unified, Versatile, Highly Dynamic Many-to-many Image Generation
Agent0-VL: Exploring Self-Evolving Agent for Tool-Integrated Vision-Language Reasoning
MedSAM3: Delving into Segment Anything with Medical Concepts
SteadyDancer: Harmonized and Coherent Human Image Animation with First-Frame Preservation
GigaEvo: An Open Source Optimization Framework Powered By LLMs And Evolution Algorithms
Fidelity-Aware Recommendation Explanations via Stochastic Path Integration
Extracting Interaction-Aware Monosemantic Concepts in Recommender Systems
MSRNet: A Multi-Scale Recursive Network for Camouflaged Object Detection