Prompt Sliders for Fine-Grained Control, Editing and Erasing of Concepts in Diffusion Models

1University of California, San Diego

Abstract

We present a textual inversion method for learning prompt embeddings for specific attributes/concepts using textual inversion method. The strength of the concept can be controlled by manipulating the weight of the learned prompts.

Diffusion models have recently surpassed GANs in image synthesis and editing, offering superior image quality and diversity. However, achieving precise control over attributes in generated images remains a challenge. Concept Sliders introduced a method for fine-grained image control and editing by learning concepts (attributes/objects). However, this approach adds parameters and increases inference time due to the loading and unloading of Low-Rank Adapters (LoRAs) used for learning concepts. These adapters are model-specific and require retraining for different architectures, such as Stable Diffusion (SD) v1.5 and SD-XL. In this paper, we propose a straightforward textual inversion method to learn concepts through text embeddings, which are generalizable across models that share the same text encoder, including different versions of the SD model. We refer to our method as {Prompt Sliders}. Besides learning new concepts, we also show that prompt sliders can be used to erase undesirable concepts such as artistic styles or mature content. Our method is 30\% faster than using LoRAs because it eliminates the need to load and unload adapters and introduces no additional parameters aside from the target concept text embedding. Each concept embedding only requires 3KB of storage compared to the 8922 KB or more required for each LoRA adapter making our approach more computationally efficient.

Prompt Sliders for fine-grained control of attributes with textual inversion.

Each row in the figure shows a corresponding concept depicted on top of the image along with its control strength $\alpha$ that enhances the target concept as the guidance strength increases. The prompts for the images from top to bottom are listed as follows (in order) - {"A portrait of an woman with a warm smile", "A woman with voluminous hair cascading over her shoulders, posing for a fashion shoot", "A fantasy character", "A person caught off guard by unexpected news"

Image synthesis by Prompt Sliders1. Image synthesis by Prompt Sliders2.

FG-DM framework

Left: Training of textual Prompt Sliders. Right: Training of visual Prompt Sliders.

Prompt Slider framework.

Comparison with Concept Sliders

Unlike Concept Sliders which require careful hyperparameter tuning for learning the target concept, Prompt Sliders are simple to optimize and work well for diverse prompts. The corresponding prompts for the images in the figure from left to right are as follows. {"A whimsical dragon, full of texture and detail", "A bustling city street, with people walking", "A cat lounging comfortably on a sofa", "A child opening a gift", "A child playing with a favorite toy"}.

Comparison.

Erasing concepts with Prompt Sliders.

Erasing concepts from pretrained diffusion models using Prompt Sliders. The prompts used to generate the image are shown on top of each image.

Erasing.

Transferring the trained prompt sliders from SD-XL to SD v1.5 model.

The corresponding prompts for the images in the figure from left to right in the top row, and from left to right in the bottom row are as follows. {"A photo of a person", "A man with a thick, beard, giving a charming smile", "A bride getting ready for her wedding day", "A cozy living room"}.

Transfer SD15.

Composition of multiple concepts.

The input prompt for all the images are "A photo of a person" followed by the concept tokens. Concepts are appended to the prompt sequentially and is depicted in the figure from left to right in the top row, then continues from left to right in the bottom row.

Composition.