Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

FASTER: Rethinking Real-Time Flow VLAs

3DreamBooth: High-Fidelity 3D Subject-Driven Video Generation Model































FASTER: Rethinking Real-Time Flow VLAs

3DreamBooth: High-Fidelity 3D Subject-Driven Video Generation Model






























SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing
Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding
Efficient Reasoning with Balanced Thinking
Look Before Acting: Enhancing Vision Foundation Representations for Vision-Language-Action Models
Complementary Reinforcement Learning
Alignment Makes Language Models Normative, Not Descriptive
MosaicMem: Hybrid Spatial Memory for Controllable Video World Models
MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild
Video-CoE: Reinforcing Video Event Prediction via Chain of Events
FunCineForge: A Unified Dataset Toolkit and Model for Zero-Shot Movie Dubbing in Diverse Cinematic Scenes
In-Context Watermarks for Large Language Models
WorldCam: Interactive Autoregressive 3D Gaming Worlds with Camera Pose as a Unifying Geometric Representation
Demystifing Video Reasoning
Kinema4D: Kinematic 4D World Modeling for Spatiotemporal Embodied Simulation
Qianfan-OCR: A Unified End-to-End Model for Document Intelligence
InCoder-32B: Code Foundation Model for Industrial Scenarios
MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification
HSImul3R: Physics-in-the-Loop Reconstruction of Simulation-Ready Human-Scene Interactions
Mixture-of-Depths Attention
Attention Residuals
Grounding World Simulation Models in a Real-World Metropolis
OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data
AI Can Learn Scientific Taste
MM-CondChain: A Programmatically Verified Benchmark for Visually Grounded Deep Compositional Reasoning
Can Vision-Language Models Solve the Shell Game?
OmniForcing: Unleashing Real-time Joint Audio-Visual Generation
daVinci-Env: Open SWE Environment Synthesis at Scale
Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation
LMEB: Long-horizon Memory Embedding Benchmark
DreamVideo-Omni: Omni-Motion Controlled Multi-Subject Video Customization with Latent Identity Reinforcement Learning
SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing
Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding
Efficient Reasoning with Balanced Thinking
Look Before Acting: Enhancing Vision Foundation Representations for Vision-Language-Action Models
Complementary Reinforcement Learning
Alignment Makes Language Models Normative, Not Descriptive
MosaicMem: Hybrid Spatial Memory for Controllable Video World Models
MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild
Video-CoE: Reinforcing Video Event Prediction via Chain of Events
FunCineForge: A Unified Dataset Toolkit and Model for Zero-Shot Movie Dubbing in Diverse Cinematic Scenes
In-Context Watermarks for Large Language Models
WorldCam: Interactive Autoregressive 3D Gaming Worlds with Camera Pose as a Unifying Geometric Representation
Demystifing Video Reasoning
Kinema4D: Kinematic 4D World Modeling for Spatiotemporal Embodied Simulation
Qianfan-OCR: A Unified End-to-End Model for Document Intelligence
InCoder-32B: Code Foundation Model for Industrial Scenarios
MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification
HSImul3R: Physics-in-the-Loop Reconstruction of Simulation-Ready Human-Scene Interactions
Mixture-of-Depths Attention
Attention Residuals
Grounding World Simulation Models in a Real-World Metropolis
OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data
AI Can Learn Scientific Taste
MM-CondChain: A Programmatically Verified Benchmark for Visually Grounded Deep Compositional Reasoning
Can Vision-Language Models Solve the Shell Game?
OmniForcing: Unleashing Real-time Joint Audio-Visual Generation
daVinci-Env: Open SWE Environment Synthesis at Scale
Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation
LMEB: Long-horizon Memory Embedding Benchmark
DreamVideo-Omni: Omni-Motion Controlled Multi-Subject Video Customization with Latent Identity Reinforcement Learning