Representation Autoencoders
Representation Autoencoders (RAEs) were proposed by a team led by Assistant Professor Xie Saining at New York University in October 2025, and the relevant research results were published in the paper "Diffusion Transformers with Representation Autoencoders".
Representational Encoders (RAEs) replace traditional representational encoders (VAEs) by combining a pre-trained representation encoder (such as DINO, SigLIP, or MAE) with a trained decoder. These models provide high-quality reconstructions and semantically rich latent spaces, while allowing for scalable transformer architectures. Compared to VAE-based models, RAEs achieve faster convergence and higher-quality samples during latent diffusion training.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.