NUWA-Infinity: Autoregressive over Autoregressive Generation for
Infinite Visual Synthesis
NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis
Chenfei Wu Jian Liang Xiaowei Hu Zhe Gan Jianfeng Wang Lijuan Wang Zicheng Liu Yuejian Fang Nan Duan

Abstract
In this paper, we present NUWA-Infinity, a generative model for infinitevisual synthesis, which is defined as the task of generating arbitrarily-sizedhigh-resolution images or long-duration videos. An autoregressive overautoregressive generation mechanism is proposed to deal with this variable-sizegeneration task, where a global patch-level autoregressive model considers thedependencies between patches, and a local token-level autoregressive modelconsiders dependencies between visual tokens within each patch. A NearbyContext Pool (NCP) is introduced to cache-related patches already generated asthe context for the current patch being generated, which can significantly savecomputation costs without sacrificing patch-level dependency modeling. AnArbitrary Direction Controller (ADC) is used to decide suitable generationorders for different visual synthesis tasks and learn order-aware positionalembeddings. Compared to DALL-E, Imagen and Parti, NUWA-Infinity can generatehigh-resolution images with arbitrary sizes and support long-duration videogeneration additionally. Compared to NUWA, which also covers images and videos,NUWA-Infinity has superior visual synthesis capabilities in terms of resolutionand variable-size generation. The GitHub link ishttps://github.com/microsoft/NUWA. The homepage link ishttps://nuwa-infinity.microsoft.com.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| image-outpainting-on-lhqc | NUWA-Infinity w/o text | Block-FID (Right Extend): 6.43 Block-FID (Down Extend): 11.47 Block-FID (Left Extend): 6.71 Block-FID (Up Extend): 8.03 |
| image-outpainting-on-lhqc | NUWA-Infinity | Block-FID (Right Extend): 6.45 Block-FID (Down Extend): 9.84 Block-FID (Left Extend): 6.72 Block-FID (Up Extend): 7.43 |
| text-to-image-generation-on-lhqc | NUWA-Infinity | Block-FID: 9.71 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.