StableGarment: Garment-Centric Generation via Stable Diffusion

1Beijing University of Posts and Telecommunications, 2Xiaohongshu.Inc, 3Carnegie Mellon Unversity
Interpolate start reference image.

StableGarment can perform 1) utilize text prompts or control signals to generate a realistic model wearing the garment, 2) support switching stylized models to generate stylized models wearing the garment, and 3) conventional virtual try-on tasks.

Abstract

In this paper, we introduce StableGarment, a unified framework to tackle garment-centric(GC) generation tasks, including GC text-to-image, controllable GC text-to-image, stylized GC text-to-image, and robust virtual try-on.

The main challenge lies in retaining the intricate textures of the garment while maintaining the flexibility of pre-trained Stable Diffusion. Our solution involves the development of a garment encoder, a trainable copy of the denoising UNet equipped with additive self-attention (ASA) layers. These ASA layers are specifically devised to transfer detailed garment textures, also facilitating the integration of stylized base models for the creation of stylized images. Furthermore, the incorporation of a dedicated try-on ControlNet enables StableGarment to execute virtual try-on tasks with precision. We also build a novel data engine that produces high-quality synthesized data to preserve the model’s ability to follow prompts.

Extensive experiments demonstrate that our approach delivers state-of-the-art (SOTA) results among existing virtual try-on methods and exhibits high flexibility with broad potential applications in various garment-centric image generation.

Approach

Interpolate start reference image. Overview of our StableGarment , consisting of a data engine, garment encoder, and try-on ControlNet. The data engine preserves the model's capacity to follow prompts, while the garment encoder with addictive self-attention layer captures garment details. Meanwhile, the try-on ControlNet is designed for virtual try-on tasks.

Comparison on text-to-image generation

Comparison with other finetuning-free subject-driven genration.

Interpolate start reference image.

Comparison on virtual try-on task

Qualitative comparison with baselines on VITON-HD dataset.

Interpolate start reference image.

More Applications

1. Garment-centric generation given target individuals

Our model, when combined with the IP-Adapter, enables the generation of target individuals wearing target garments.

Interpolate start reference image.

2. Stylized garment-centric generation

By replacing the standard Stable Diffusion 1.5 model with other diverse base models, we can generate creative and stylized outputs while preserving the intricate details of the garments.

Interpolate start reference image.

3. E-commerce model generation

Leveraging the capabilities of ControlNet, our model can generate e-commerce models guided by specific conditions, such as OpenPose and DensePose.

Interpolate start reference image.

Ablation Study

We conduct complete ablation studies to validate the effectiveness of each component in our model.

Interpolate start reference image.

BibTeX

@misc{wang2024stablegarment,
      title={StableGarment: Garment-Centric Generation via Stable Diffusion}, 
      author={Rui Wang and Hailong Guo and Jiaming Liu and Huaxia Li and Haibo Zhao and Xu Tang and Yao Hu and Hao Tang and Peipei Li},
      year={2024},
      eprint={2403.10783},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}