Command Palette
Search for a command to run...
Generative Recursive Reasoning
Generative Recursive Reasoning
Junyeob Baek Mingyu Jo Minsu Kim Mengye Ren Yoshua Bengio Sungjin Ahn
Abstract
Recursive Reasoning Models (RRMs) offer a promising alternative to autoregressive sequence extension by performing iterative latent-state refinement with shared transition functions. Yet existing RRMs are largely deterministic, following a single latent trajectory and converging to a single prediction. We introduce Generative Recursive reAsoning Models (GRAM), a framework that turns recursive latent reasoning into probabilistic multi-trajectory computation. GRAM models reasoning as a stochastic latent trajectory, enabling multiple hypotheses, alternative solution strategies, and inference-time scaling through both recursive depth and parallel trajectory sampling. This yields a latent-variable generative model supporting conditional reasoning via p_theta(y | x) and, with fixed or absent inputs, unconditional generation via p_theta(x). Trained with amortized variational inference, GRAM improves over deterministic recurrent and recursive baselines on structured reasoning and multi-solution constraint satisfaction tasks, while demonstrating an unconditional generation capability.
One-sentence Summary
Generative Recursive reAsoning Models (GRAM), trained with amortized variational inference, transform deterministic recursive reasoning into probabilistic multi-trajectory computation via stochastic latent trajectories to enable multiple hypotheses and inference-time scaling through recursive depth and parallel trajectory sampling, improving over deterministic recurrent and recursive baselines on structured reasoning and multi-solution constraint satisfaction tasks while supporting conditional reasoning and unconditional generation.
Key Contributions
- The paper introduces Generative Recursive Reasoning Models (GRAM), a framework that formulates recursive reasoning as a latent-variable generative process using stochastic latent trajectories. This approach allows solutions to be obtained by marginalizing over multiple sampling paths instead of converging to a single deterministic prediction.
- This work establishes width-based inference-time scaling, enabling the model to scale computation through both recursive depth and the number of parallel sampled trajectories. This mechanism facilitates the maintenance of multiple hypotheses and the exploration of alternative solution strategies during inference.
- Empirical evaluations on benchmarks such as Sudoku-Extreme, ARC-AGI, and N-Queens demonstrate improvements over deterministic recurrent and recursive baselines. Results confirm the framework yields advantages in structured reasoning, multi-solution constraint satisfaction, and unconditional generation capabilities.
Introduction
Future neural reasoning systems require extended computation mechanisms beyond standard autoregressive sequence generation. Recursive Reasoning Models address this by iteratively refining a persistent latent state to decouple reasoning depth from parameter scale. However, existing implementations remain fundamentally deterministic, causing reasoning paths to converge on a single prediction and failing to explore alternative hypotheses. The authors introduce Generative Recursive Reasoning Models (GRAM) to transform recursive latent reasoning into probabilistic multi-trajectory computation. This framework treats reasoning as a stochastic latent trajectory, enabling the maintenance of multiple solutions and inference scaling through parallel trajectory sampling alongside recursive depth.
Dataset
-
Dataset Composition and Sources
- The authors utilize a suite of discrete reasoning and generation tasks including Sudoku, ARC-AGI, N-Queens, Graph Coloring, and MNIST.
- Synthetic data for N-Queens and Graph Coloring is algorithmically generated to control solution counts and difficulty.
- Vision tasks draw from the MNIST dataset and the ARC-AGI benchmark.
-
Subset Details and Processing
- N-Queens
- Source: All valid complete solutions for N=8 and N=10 boards serve as the base.
- Filtering: Puzzle instances form by removing k queens (5 to 7 for N=8, 7 to 9 for N=10).
- Split: An 85:15 train-test split operates on unique input configurations to prevent memorization.
- Encoding: Boards flatten into 1D sequences with a vocabulary of padding, empty cells, and queens.
- Graph Coloring
- Source: Graphs sample from an Erdos-Renyi random model with 8 or 10 nodes and 3 colors.
- Filtering: Only 3-colorable graphs retain, with canonical forms kept to remove color permutation redundancy.
- Size: The dataset includes 7,002 training and 255 test instances for N=8, plus 13,465 training and 192 test instances for N=10.
- Encoding: Inputs encode the flattened upper triangular adjacency matrix while outputs map node positions to color IDs.
- Other Tasks
- Sudoku: 9×9 grids flatten row-by-row into 81 tokens with a vocabulary size of 11.
- ARC-AGI: Variable grids pad to a fixed 30×30 canvas with EOS markers and task-specific embeddings.
- MNIST: Images quantize and process via CNN-based patchification into 14×14 flattened sequences with a vocabulary size of 3.
- N-Queens
Method
Generative Recursive Reasoning Models (GRAM) introduce a probabilistic framework for recursive reasoning, distinguishing itself from deterministic approaches by modeling a distribution over latent reasoning trajectories. Rather than following a single fixed path, GRAM allows the latent state to evolve stochastically, enabling the exploration of multiple solution paths for a given input. This conceptual difference is illustrated in the figure below, where deterministic models converge to a single trajectory, whereas GRAM samples diverse paths to reach potential solutions.
The architecture of GRAM is organized into an encoder, a hierarchical recursive core, and a decoder, as shown in the architecture schematic below. The encoder first computes an embedding ex from the input x, which is reused throughout the computation. The core maintains a latent state z=(h,l) composed of a high-level component h and a low-level component l. The high-level state accumulates abstract reasoning information across transitions, while the low-level state undergoes rapid refinement within each transition. Specifically, the low-level component is updated K times via a function fL while the high-level component remains fixed. Subsequently, the high-level component is updated via fH and augmented with a learnable stochastic guidance signal ϵt. This hierarchical structure allows for multi-scale reasoning, separating fine-grained computation from high-level trajectory steering.
Training GRAM involves optimizing a variational evidence lower bound (ELBO) to approximate the conditional likelihood pθ(y∣x). The model is treated as a latent-variable probabilistic model where the full latent trajectory τ consists of a sequence of latent variables. To handle the intractable marginalization over trajectories, a variational posterior qϕ(τ∣x,y) is introduced alongside the prior pθ(τ∣x). The optimization objective includes a reconstruction term and a Kullback-Leibler (KL) divergence term that regularizes the posterior to be close to the prior. During training, deep supervision is applied over multiple supervision steps, where the terminal state of one step serves as the initial state of the next. Gradients are propagated through the final transition of each step to ensure memory efficiency.
The probabilistic nature of the model enables it to capture multimodal distributions over outputs. For tasks with multiple valid solutions, such as graph coloring, GRAM can sample distinct valid configurations by traversing different latent trajectories, as demonstrated in the figure below.
Furthermore, the same recursive mechanism can be adapted for unconditional generative modeling pθ(x) by replacing the input conditioning with an empty embedding. In this setting, the model generates data iteratively through the latent space, with the quality of the generated samples improving as the recursive depth increases. This capability allows GRAM to function as a generative model for tasks such as image synthesis, where it progressively refines noise into coherent structures, as shown in the generation process below.
Experiment
The evaluation assesses GRAM across structured reasoning benchmarks, multi-solution puzzles, and unconditional generation tasks to validate its probabilistic recursive architecture. Findings indicate that stochastic transitions allow the model to explore diverse reasoning paths and scale effectively through parallel sampling, allowing it to outperform deterministic baselines that suffer from mode collapse on complex problems. Furthermore, recursive refinement enables stricter constraint satisfaction than generative sampling alone, with ablation studies confirming that stochastic guidance is essential for navigating multi-solution spaces and improving overall performance.
The authors evaluate GRAM on unconditional Sudoku generation, comparing it against D3PM baselines of varying sizes. The results demonstrate that GRAM achieves superior validity rates while utilizing significantly fewer parameters and inference steps compared to the diffusion-based baselines. GRAM achieves the highest validity rate among all tested methods. GRAM requires substantially fewer computational steps compared to the D3PM variants. GRAM operates with a smaller parameter count than the competing baselines.
The the the table outlines the training configuration and computational costs for the GRAM model across various structured reasoning and generation benchmarks. It indicates that ARC-AGI is the most resource-intensive task, requiring substantially more training epochs and time compared to other benchmarks like Sudoku or N-Queens. ARC-AGI requires the most extensive training resources, taking significantly longer than other tasks. N-Queens tasks are the most computationally efficient, requiring the fewest epochs and shortest training duration. Graph Coloring training time increases notably as the number of nodes grows, despite maintaining the same number of training epochs.
The the the table presents an ablation study evaluating the impact of stochastic guidance and stochasticity on the GRAM model's performance on Sudoku and N-Queens tasks. The full GRAM model achieves the highest accuracy on both benchmarks, demonstrating the effectiveness of combining stochastic transitions with learned guidance. Removing stochastic guidance significantly degrades performance, particularly on the multi-solution N-Queens task, while removing stochasticity entirely causes the model to fail completely. Full GRAM outperforms all ablated variants and baselines on both benchmarks. Stochasticity alone maintains high Sudoku performance but collapses on N-Queens, highlighting the need for structured guidance in multi-solution spaces. Deterministic guidance without stochasticity results in total failure, indicating that stochastic transitions are essential for navigating the solution space.
The the the table presents an evaluation of unconditional image generation on binarized MNIST, comparing the proposed GRAM model against VAE, D3PM, and TRM baselines. Results demonstrate that GRAM significantly outperforms the deterministic TRM baseline, which exhibits mode collapse, while achieving performance comparable to the diffusion-based D3PM model. Furthermore, the findings indicate that increasing the number of recursion steps during inference leads to consistent improvements in generation quality. The deterministic TRM baseline suffers from mode collapse, yielding significantly poorer generation quality than GRAM. GRAM produces images with quality comparable to the D3PM diffusion model. Generation quality improves monotonically as the number of recursion steps increases.
The authors evaluate GRAM on multi-solution puzzles like N-Queens and Graph Coloring to test its ability to capture diverse valid solutions. Results indicate that GRAM outperforms deterministic recursive baselines in coverage and surpasses generative models in accuracy and constraint satisfaction. This suggests that combining recursive refinement with stochastic sampling allows the model to navigate complex solution spaces more effectively than using either approach alone. GRAM achieves superior accuracy and solution coverage on N-Queens compared to both recursive and autoregressive baselines. The method significantly reduces constraint violations in Graph Coloring tasks while maintaining high diversity in generated solutions. Deterministic recursive models struggle with multi-solution coverage, whereas GRAM leverages stochastic transitions to explore diverse reasoning paths.
The authors evaluate GRAM on structured reasoning benchmarks such as Sudoku, N-Queens, and ARC-AGI alongside unconditional image generation, comparing performance against diffusion and deterministic baselines. Results indicate that GRAM achieves superior validity and accuracy with fewer parameters and inference steps while avoiding the mode collapse observed in deterministic recursive models. Ablation studies further confirm that combining stochastic transitions with learned guidance is essential for navigating complex solution spaces and ensuring robust constraint satisfaction.