InstantID: Zero-shot Identity-Preserving Generation in Seconds
InstantID: Zero-shot Identity-Preserving Generation in Seconds
Qixun Wang Xu Bai Haofan Wang Zekui Qin Anthony Chen Huaxia Li Xu Tang Yao Hu

Abstract
There has been significant progress in personalized image synthesis withmethods such as Textual Inversion, DreamBooth, and LoRA. Yet, their real-worldapplicability is hindered by high storage demands, lengthy fine-tuningprocesses, and the need for multiple reference images. Conversely, existing IDembedding-based methods, while requiring only a single forward inference, facechallenges: they either necessitate extensive fine-tuning across numerous modelparameters, lack compatibility with community pre-trained models, or fail tomaintain high face fidelity. Addressing these limitations, we introduceInstantID, a powerful diffusion model-based solution. Our plug-and-play moduleadeptly handles image personalization in various styles using just a singlefacial image, while ensuring high fidelity. To achieve this, we design a novelIdentityNet by imposing strong semantic and weak spatial conditions,integrating facial and landmark images with textual prompts to steer the imagegeneration. InstantID demonstrates exceptional performance and efficiency,proving highly beneficial in real-world applications where identitypreservation is paramount. Moreover, our work seamlessly integrates withpopular pre-trained text-to-image diffusion models like SD1.5 and SDXL, servingas an adaptable plugin. Our codes and pre-trained checkpoints will be availableat https://github.com/InstantID/InstantID.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| diffusion-personalization-tuning-free-on | InstantID | Cosine Similarity: 0.713 FID: 18.598 LPIPS: 0.437 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.