HyperAIHyperAI

Command Palette

Search for a command to run...

Gated Attention

Date

3 days ago

Organization

MIT
Stanford University
Tsinghua University
University of Edinburgh

Paper URL

1b7whO4SfY

Tags

Gated Attention was proposed in May 2025 by the Alibaba Tongyi Qianwen team in collaboration with research teams from the University of Edinburgh, Stanford University, and other universities. The relevant research findings were published in the paper "Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free", won the Best Paper Award at NeurIPS 2025.

The research team systematically investigated a series of gated-enhanced softmax attention variants through large-scale experiments (covering 30 variants of 15B MoE and 1.7B dense models, trained on 3.5T tokens). The study found that applying a specific-head sigmoid gating after Scaled Dot Product Attention (SDPA) can consistently improve model performance. This achievement highlights the impact of gating mechanisms on model performance and behavior in standard attention layers, revealing their ability to introduce nonlinearity, sparsity, and eliminate attention traps through evaluation of gating variants. These findings deepen the industry's understanding of gated attention mechanisms.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp