Foundation Cures Personalization:
Improving Personalized Models' Prompt Consistency via Hidden Foundation Knowledge

Anonymous Submission

Motivation and Method

We analyze the challenges current methods face in achieving precise control over facial features. The key issue lies in how personalization models process identity data: their cross-attention layers focus too much on identity-related tokens while undermining those corresponding to facial attributes (e.g., hairstyle, expression, see the left part). Consequently, these attributes become difficult to manipulate. However, these cross-attention layers (or adapters) are critical for preserving identity accuracy in the generated output, making them resistant to modification. Adjusting the cross-attention maps frequently leads to the loss of essential identity characteristics (see the right parts).

While keeping cross-attention modules intact, we propose a novel foundation-aware self-attention (FASA), enabling attributes with high prompt consistency to replace those that are ill-aligned during personalization generation. To protect the identity unharmed, this strategy also leverages semantic segmentation models to generate the scaling masks of these attributes, therefore making such replacement happen in a highly localized and harmonious manner. Furthermore, we use a simple but effective approach called asymmetric prompt guidance (APG) to restore abstract attributes such as expression.

Different Baselines (Including SDv1.5, SDXL and FLUX)

Baselines based on Stable-Diffusion-v1.5 (FastComposer, Face-Diffuser, Face2Diffusion)

Baselines based on Stable-Diffusion-XL (InstantID, PhotoMaker, PuLID)

Baselines based on FLUX.1-dev (PuLID, InfiniteYou)

Different Datasets (CelebA-HQ and self-curated data)

Celebrities (from CelebA-HQ)

Non-celebrities (self-curated data)

Different Facial Attributes (e.g. hair style, expression, accessories, eye color)

1. Hair styles

2. Accessories

3. Expressions

4. Combination of multiple attributes.

Analysis

1. Multiple-attribute Prompt Consistency

We validate that the subsequent enhancement processes of FreeCure do not impact attributes that have already been enhanced. The figure visualizes the intermediate results at FreeCure's different stages. When APG is used in the latter stages, attributes previously enhanced through FASA remain unaffected. This illustrates that FreeCure's improvement across multiple attributes is highly robust and consistent.

2. Different Initial Noise

We observe that, with identical initial noise, all baselines' FD and PD processes generate faces with similar attribute locations. It is important to validate that FASA's robust performance without these condition. Thus, we relax this condition and regenerate faces with different initial noises. This figure shows that results under the two settings are comparable, which confirms that FASA can effectively enhance the generated results of PD, even when its FD counterpart produces faces with variable spatial structures.

3. Visualization of FASA Modules

FASA effectively captures and faithfully transfers information of attributes localized in small areas (e.g. eyes, pearl earrings). Furthermore, in regions unrelated to the target attributes, FASA maintains a strong alignment with the original PD attention maps, demonstrating that it preserves the core functionality of the personalization models.

Foundation Cures Personalization:
Improving Personalized Models' Prompt Consistency via Hidden Foundation Knowledge

Anonymous Submission

Overview

Motivation and Method

Results

Different Baselines (Including SDv1.5, SDXL and FLUX)

Baselines based on Stable-Diffusion-v1.5 (FastComposer, Face-Diffuser, Face2Diffusion)

Baselines based on Stable-Diffusion-XL (InstantID, PhotoMaker, PuLID)

Baselines based on FLUX.1-dev (PuLID, InfiniteYou)

Different Datasets (CelebA-HQ and self-curated data)

Celebrities (from CelebA-HQ)

Celebrities (from CelebA-HQ)

Non-celebrities (self-curated data)

Non-celebrities (self-curated data)

Different Facial Attributes (e.g. hair style, expression, accessories, eye color)

1. Hair styles

2. Accessories

3. Expressions

4. Combination of multiple attributes.

Analysis

1. Multiple-attribute Prompt Consistency

2. Different Initial Noise

3. Visualization of FASA Modules

Foundation Cures Personalization: Improving Personalized Models' Prompt Consistency via Hidden Foundation Knowledge

Anonymous Submission

Overview

Motivation and Method

Results

Different Baselines (Including SDv1.5, SDXL and FLUX)

Baselines based on Stable-Diffusion-v1.5 (FastComposer, Face-Diffuser, Face2Diffusion)

Baselines based on Stable-Diffusion-XL (InstantID, PhotoMaker, PuLID)

Baselines based on FLUX.1-dev (PuLID, InfiniteYou)

Different Datasets (CelebA-HQ and self-curated data)

Celebrities (from CelebA-HQ)

Celebrities (from CelebA-HQ)

Non-celebrities (self-curated data)

Non-celebrities (self-curated data)

Different Facial Attributes (e.g. hair style, expression, accessories, eye color)

1. Hair styles

2. Accessories

3. Expressions

4. Combination of multiple attributes.

Analysis

1. Multiple-attribute Prompt Consistency

2. Different Initial Noise

3. Visualization of FASA Modules

Foundation Cures Personalization:
Improving Personalized Models' Prompt Consistency via Hidden Foundation Knowledge