Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Measuring short-form factuality in large language models

DeepSearchQA: Bridging the Comprehensiveness Gap for Deep Research Agents































Measuring short-form factuality in large language models

DeepSearchQA: Bridging the Comprehensiveness Gap for Deep Research Agents






























MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents
AI-Trader: Benchmarking Autonomous Agents in Real-Time Financial Markets
Latent Implicit Visual Reasoning
LLM Personas as a Substitute for Field Experiments in Method Benchmarking
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI
HiStream: Efficient High-Resolution Video Generation via Redundancy-Eliminated Streaming
TokSuite: Measuring the Impact of Tokenizer Choice on Language Model Behavior
Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning
Beyond Memorization: A Multi-Modal Ordinal Regression Benchmark to Expose Popularity Bias in Vision-Language Models
DreaMontage: Arbitrary Frame-Guided One-Shot Video Generation
T2AV-Compass: Towards Unified Evaluation for Text-to-Audio-Video Generation
TongSIM: A General Platform for Simulating Intelligent Machines
Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition
RoboSafe: Safeguarding Embodied Agents via Executable Safety Logic
A Real-World Evaluation of LLM Medication Safety Reviews in NHS Primary Care
Multi-LLM Thematic Analysis with Dual Reliability Metrics: Combining Cohen's Kappa and Semantic Similarity for Qualitative Research Validation
Active Intelligence in Video Avatars via Closed-loop World Modeling
FaithLens: Detecting and Explaining Faithfulness Hallucination
SAM Audio: Segment Anything in Audio
Step-DeepResearch Technical Report
SpatialTree: How Spatial Abilities Branch Out in MLLMs
SemanticGen: Video Generation in Semantic Space
Automated stereotactic radiosurgery planning using a human-in-the-loop reasoning large language model agent
LongVideoAgent: Multi-Agent Reasoning with Long Videos
GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators
WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion
LoGoPlanner: Localization Grounded Navigation Policy with Metric-aware Visual Geometry
Can LLMs Estimate Student Struggles? Human-AI Difficulty Alignment with Proficiency Simulation for Item Difficulty Prediction
QuCo-RAG: Quantifying Uncertainty from the Pre-training Corpus for Dynamic Retrieval-Augmented Generation
The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding
MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents
AI-Trader: Benchmarking Autonomous Agents in Real-Time Financial Markets
Latent Implicit Visual Reasoning
LLM Personas as a Substitute for Field Experiments in Method Benchmarking
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI
HiStream: Efficient High-Resolution Video Generation via Redundancy-Eliminated Streaming
TokSuite: Measuring the Impact of Tokenizer Choice on Language Model Behavior
Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning
Beyond Memorization: A Multi-Modal Ordinal Regression Benchmark to Expose Popularity Bias in Vision-Language Models
DreaMontage: Arbitrary Frame-Guided One-Shot Video Generation
T2AV-Compass: Towards Unified Evaluation for Text-to-Audio-Video Generation
TongSIM: A General Platform for Simulating Intelligent Machines
Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition
RoboSafe: Safeguarding Embodied Agents via Executable Safety Logic
A Real-World Evaluation of LLM Medication Safety Reviews in NHS Primary Care
Multi-LLM Thematic Analysis with Dual Reliability Metrics: Combining Cohen's Kappa and Semantic Similarity for Qualitative Research Validation
Active Intelligence in Video Avatars via Closed-loop World Modeling
FaithLens: Detecting and Explaining Faithfulness Hallucination
SAM Audio: Segment Anything in Audio
Step-DeepResearch Technical Report
SpatialTree: How Spatial Abilities Branch Out in MLLMs
SemanticGen: Video Generation in Semantic Space
Automated stereotactic radiosurgery planning using a human-in-the-loop reasoning large language model agent
LongVideoAgent: Multi-Agent Reasoning with Long Videos
GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators
WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion
LoGoPlanner: Localization Grounded Navigation Policy with Metric-aware Visual Geometry
Can LLMs Estimate Student Struggles? Human-AI Difficulty Alignment with Proficiency Simulation for Item Difficulty Prediction
QuCo-RAG: Quantifying Uncertainty from the Pre-training Corpus for Dynamic Retrieval-Augmented Generation
The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding