Potential Diffusion Model SVG
Self-supervised representations for Visual Generation (SVG) was jointly proposed by Tsinghua University and the Kuaishou Keling team in October 2025. The relevant research results were published in the paper "[…]".Latent Diffusion Model without Variational Autoencoder".
SVG is a novel latent diffusion model that does not require a variational autoencoder (VAE), freeing up self-supervised representations for visual generation. This model constructs a semantically discriminative feature space by leveraging frozen DINO features, while a lightweight residual branch captures fine-grained details for high-quality reconstruction. The diffusion model is trained directly on this semantically structured latent space, facilitating more efficient learning. Therefore, SVG accelerates diffusion training, supports fewer sampling steps, and improves generation quality.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.