The NeurIPS 2025 Best Paper Awards Have Been Announced! A Collaborative Research Project by Qwen's Team, Tsinghua University, Stanford University, and Others Has Been selected.

The NeurIPS 2025 Best Paper Award and Runner-up Paper Award were given to 7 groundbreaking papers, including 4 Best Papers (one of which is from the dataset and benchmark domain) and 3 Runner-up Papers.
These seven papers highlight the latest advancements in diffusion model theory, self-supervised reinforcement learning, attention mechanisms in large language models, reasoning capabilities of language models, online learning theory, neural scaling laws, and benchmarking methods for language model diversity.
4 Best Papers
1. Artificial Intelligence Crowdsourcing: The Open Homogeneity of Language Models (and its Extensions)
Title: Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)
* Research Team:University of Washington, Carnegie Mellon University, Allen Institute for Artificial Intelligence, Lila Sciences, Stanford University
* Abstract:Large language models (LMs) often struggle to generate diverse, human-like creative content, raising concerns about the long-term homogenization of human thought due to repeated exposure to similar outputs. However, current scalable methods for assessing the diversity of language model outputs remain limited, especially outside of narrow tasks such as random number or name generation, or beyond repeated sampling of a single model.
To address this deficiency, we introduce Infinity-Chat, a massive dataset containing 26,000 diverse, real-world, open-ended user queries that allow for multiple plausible answers rather than a single "correct answer." We also present, for the first time, a comprehensive classification system to describe all open-ended prompts presented to a language model, comprising six top-level categories (e.g., creative content generation, brainstorming, and ideation), each further subdivided into 17 subcategories.
We conducted a large-scale study of pattern collapse in language models (LMs) using the Infinity-Chat platform, revealing a significant “artificial hive mentality” effect in the generation of open-ended language models. This effect manifests as: (1) intra-model repeatability, i.e., individual models consistently generate similar responses; and (2) inter-model homogeneity, i.e., different models produce strikingly similar outputs. The Infinity-Chat platform also includes 31,250 human annotations, covering absolute ratings and pairwise preferences, with 25 independent human annotations for each example. This enabled us to study collective and individual human preferences for open-ended queries. Our results indicate that, despite maintaining considerable overall quality, state-of-the-art language models, reward models, and language model evaluators showed low match to human ratings for model generation that elicited personalized preferences from different annotators. Overall, INFINITY-CHAT provides the first large-scale resource for systematically studying open-ended queries on language models in the real world, revealing key insights to guide future research and mitigate the long-term AI safety risks posed by AI herd mentality.
* Paper link:https://go.hyper.ai/DZga5
2. Application of Gated Attention Mechanisms in Large-Scale Language Models: Nonlinearity, Sparsity, and Unattended Convergence
Title: Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
* Research Team:Alibaba Qwen team, University of Edinburgh, Stanford University, Massachusetts Institute of Technology, Tsinghua University
* Abstract:Gating mechanisms have been widely applied, from early models like LSTM and Highway Networks to more recent state-space models, linear attention mechanisms, and softmax attention mechanisms. However, existing literature rarely studies the specific effects of gating mechanisms. This paper systematically investigates gating-enhanced softmax attention mechanism variants through a series of comprehensive experiments. Specifically, we comprehensively compare 30 variants, including 15 billion Hybrid Expert (MoE) models and 1.7 billion dense models, all trained on a dataset of 3.5 trillion tokens. Our main finding is that a simple improvement—applying a head-specific sigmoid gating after Scaled Dot Product Attention (SDPA)—can consistently improve model performance. Furthermore, this improvement enhances training stability, increases the model's tolerance to learning rates, and improves model scalability. By comparing different gating locations and computational variants, we attribute this effectiveness to two key factors: (1) introducing nonlinearity into the low-rank mapping of the softmax attention mechanism, and (2) applying query-relevant sparse gating scores to modulate the SDPA output. Notably, we find that this sparse gating mechanism alleviates large-scale activations and attention traps and improves long context extrapolation performance. We also publish the relevant code and model to facilitate future research. Furthermore, the most efficient SDPA output gating has been applied to the Qwen3-Next model.
* Paper address:https://go.hyper.ai/iBANK
* Github address:https://github.com/qiuzh20/gated_attention
3. Application of multi-layered networks in self-supervised reinforcement learning: Deep expansion can endow new goals with the ability to achieve them.
1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities
* Research Team:Princeton University, Warsaw University of Technology
* Abstract:While self-supervised learning has achieved groundbreaking progress in the large-scale application of language and vision, similar advancements have been rare in reinforcement learning (RL). This paper investigates building blocks for self-supervised reinforcement learning that significantly improve scalability, with network depth being a key factor. Most recent reinforcement learning papers rely on shallow architectures (approximately 2-5 layers), but we demonstrate that increasing the depth to 1024 layers significantly improves performance. Our experiments are conducted in an unsupervised goal-conditioning environment, without any demonstrations or rewards, requiring the agent to explore and learn from scratch how to maximize the probability of achieving the goal. Evaluations on simulated motion and manipulation tasks show that our method achieves ± times the performance improvement of self-supervised comparative reinforcement learning algorithms, outperforming other goal-conditioning baseline methods. Increasing model depth not only improves the success rate but also fundamentally alters the learned behavior.
* Paper address:https://go.hyper.ai/HR0Hx
4. Why diffusion models don't rely on rote memorization: The role of implicit dynamic regularization in training.
Why Diffusion Models Don't Memorize: The Role of Implicit Dynamical Regularization in Training
* Research Team:Paris Sciences et Lafayette University (Université PSL) and Bocconi University
* Abstract dissemination:The model has achieved significant success across various generative tasks. A key challenge lies in understanding the mechanisms by which it avoids training data memorization and achieves generalization. This study explores the role of training dynamics in the transition from generalization to memorization. Through extensive experiments and theoretical analysis, we identify two distinct timescales: an early stage where the model begins generating high-quality samples, and a later stage where memorization occurs. A key finding is that the early stage grows linearly with the training set size, while the later stage remains constant. This forms an asymptotic window of training time—during which the model generalizes effectively, but strong memorization occurs if training continues into the later stages. Overfitting only disappears with infinite training time when this timescale exceeds a model-specific threshold. These findings reveal an implicit dynamic regularization mechanism in the training dynamics that avoids memorization even under highly overparameterized settings. Our conclusions are validated through numerical experiments on real and synthetic datasets using the standard U-Net architecture and supported by theoretical analysis of tractable stochastic feature models in high-dimensional limits.
* Paper address:https://go.hyper.ai/UloDv
runner up
1. Can reinforcement learning truly incentivize logic learning models to improve their reasoning abilities beyond the base model?
Title: Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
* Team:Tsinghua University LeapLab Laboratory, Shanghai Jiao Tong University
* Abstract:In recent years, reinforcement learning based on verifiable rewards (RLVR) has achieved significant results in improving the reasoning performance of large language models (LLMs), especially in mathematical and programming tasks. It is generally believed that, similar to how traditional reinforcement learning helps agents explore and learn new strategies, RLVR enables LLMs to continuously improve themselves, thereby acquiring novel reasoning capabilities that surpass those of the underlying models. This study systematically explores the reasoning capability boundaries of RLVR-trained LLMs across different model families, reinforcement learning algorithms, and mathematical/programming/visual reasoning benchmarks, and provides an in-depth analysis of the current state of RLVR.
We use the pass@k metric for large k values as the evaluation metric. Our research reveals that while RLVR improves the sampling efficiency for correct paths, surprisingly, current training methods do not generate fundamentally new inference patterns. We observe that while RLVR-trained models outperform their base models at smaller values (e.g., =1), the base model has a higher pass@k score at larger values. Furthermore, we observe that the inference capability boundary of LLMs typically shrinks as RLVR training progresses. Further coverage and perplexity analyses indicate that the inference paths generated by the RLVR models are already contained within the sampling distribution of the base model, suggesting that their inference capabilities are derived from and limited by the base model. From this perspective, considering the base model as the upper limit, our quantitative analysis shows that the performance of the six popular RLVR algorithms is similar, far from reaching the optimal level of fully utilizing the potential of the base model.
In contrast, we find that distillation can introduce novel reasoning patterns from the teacher model and truly extend the model's reasoning capabilities. In summary, our results demonstrate that current RLVR methods have not fully realized the potential of reinforcement learning to inspire truly novel reasoning capabilities in LLMs. This highlights the need to improve reinforcement learning paradigms, such as continuous expansion and multi-turn agent-environment interactions, to unlock this potential.
* Paper address:https://go.hyper.ai/fwkSX
2. Optimal Error Bounding for Direct-Push Online Learning
Title: Optimal Mistake Bounds for Transductive Online Learning
* Team:Kent State University, Purdue University, Google Research, MIT
* summary:We address a 30-year-old open question regarding the role of unlabeled data in online learning. We do so by precisely quantifying the gap between transductive and standard online learning. We demonstrate that for every Littlestone concept class of dimension n, the transductive error bound is at least n. This represents an exponential improvement over the previous lower bounds n<sub>1</sub>, n<sub>2</sub>, and n<sub>3</sub> given by Ben-David, Kushilevitz, and Mansour (1995, 1997) and Hanneke, Moran, and Shafer (2023), respectively. We also demonstrate that our bound is tight: for every n, there exists a Littlestone concept class of dimension n with a transductive error bound of n<sub>1</sub>. Our upper bound also improves upon the previously known best upper bound given by Ben-David et al. (1997). These results demonstrate a quadratic gap between transductive and standard online learning, highlighting the advantages of early access to sequences of unlabeled instances. This contrasts sharply with the PAC setting, where transductive learning and standard learning exhibit similar sample complexity.
* Paper address:https://go.hyper.ai/00rHz
3. The superposition structure brings robust scalability to neural networks.
Title: Superposition Yields Robust Neural Scaling
* team:Massachusetts Institute of Technology
* summary:The success of large language models (LLMs) today depends on the observation that larger models perform better. However, the origin of this neural scaling law, where loss decreases power-lawfully with model size, remains unclear. We propose that representation stacking (i.e., the number of features represented by an LLM exceeds its dimensionality) may be a key factor in the loss and leads to neural scaling. Based on a toy model of Anthropic, we systematically investigate how the loss scales with model size by controlling the degree of stacking using weight decay. When stacking is weak, the loss follows a power-law only if the frequency of the data features follows a power-law distribution. Conversely, in the case of strong stacking, the loss is typically inversely proportional to the model dimensionality over a wide frequency distribution due to the geometric overlap between representation vectors. We demonstrate that open-source LLMs operating under strong stacking exhibit an inverse relationship between loss and model dimensionality, and that the scaling law of Chinchilla is consistent with this. Our results suggest that representation stacking is a core driver of neural scaling, providing insights into questions such as when neural scaling can be improved and when it fails.
* Paper address:https://go.hyper.ai/AyLWt
If you want to learn more about cutting-edge AI papers,
Welcome to:https://hyper.ai/papers