HyperAI

Abstract

We propose TraceRL, a trajectory-aware reinforcement learning framework fordiffusion language models (DLMs) that incorporates preferred inferencetrajectory into post-training, and is applicable across differentarchitectures. Equipped with a diffusion-based value model that enhancestraining stability, we demonstrate improved reasoning performance on complexmath and coding tasks. Besides, it can also be applied to adapt block-specificmodels to larger blocks, which improves sampling flexibility. EmployingTraceRL, we derive a series of state-of-the-art diffusion language models,namely TraDo. Although smaller than 7B-scale AR models, TraDo-4B-Instruct stillconsistently outperforms them across complex math reasoning tasks.TraDo-8B-Instruct achieves relative accuracy improvements of 6.1% overQwen2.5-7B-Instruct and 51.3% over Llama3.1-8B-Instruct on mathematicalreasoning benchmarks. Through curriculum learning, we also derive the firstlong-CoT DLM, outperforming Qwen2.5-7B-Instruct on MATH500 with an 18.1%relative accuracy gain. To facilitate reproducible research and practicalapplications, we release a comprehensive open-source framework for building,training, and deploying diffusion LLMs across diverse architectures. Theframework integrates accelerated KV-cache techniques and inference engines forboth inference and reinforcement learning, and includes implementations ofvarious supervised fine-tuning and RL methods for mathematics, coding, andgeneral tasks. Code and Models: https://github.com/Gen-Verse/dLLM-RL

Abstract

Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models

Yinjie Wang Ling Yang Bowen Li Ye Tian Ke Shen Mengdi Wang

Abstract

Code Repositories

Build AI with AI

Hyper Newsletters

Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models

Yinjie Wang Ling Yang Bowen Li Ye Tian Ke Shen Mengdi Wang

Abstract

Code Repositories

Build AI with AI

Hyper Newsletters

Command Palette

Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models

Yinjie Wang Ling Yang Bowen Li Ye Tian Ke Shen Mengdi Wang

Abstract

Code Repositories

Build AI with AI

Hyper Newsletters

Command Palette

Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models

Yinjie Wang Ling Yang Bowen Li Ye Tian Ke Shen Mengdi Wang

Abstract

Code Repositories

Build AI with AI

Hyper Newsletters