Foundation Cures Personalization:

Improving Personalized Models' Prompt Consistency via Hidden Foundation Knowledge

Anonymous Submission

Overview

Current facial personalization models often lack precise control over facial attributes. This is because identity embedding can negatively affect cross-attention layers and undermine other tokens' normal function. To address this limitation, we propose FreeCure, a training-free framework that effectively resolves the issue. FreeCure can be easily integrated into diverse baselines(including those based on Stable Diffusion and FLUX) and enabling robust manipulation of various facial attributes while preserving their strong identity fidelity.

Motivation and Method

Italian Trulli

We analyze the challenges current methods face in achieving precise control over facial features. The key issue lies in how personalization models process identity data: their cross-attention layers focus too much on identity-related tokens while undermining those corresponding to facial attributes (e.g., hairstyle, expression, see the left part). Consequently, these attributes become difficult to manipulate. However, these cross-attention layers (or adapters) are critical for preserving identity accuracy in the generated output, making them resistant to modification. Adjusting the cross-attention maps frequently leads to the loss of essential identity characteristics (see the right parts).

Italian Trulli

While keeping cross-attention modules intact, we propose a novel foundation-aware self-attention (FASA), enabling attributes with high prompt consistency to replace those that are ill-aligned during personalization generation. To protect the identity unharmed, this strategy also leverages semantic segmentation models to generate the scaling masks of these attributes, therefore making such replacement happen in a highly localized and harmonious manner. Furthermore, we use a simple but effective approach called asymmetric prompt guidance (APG) to restore abstract attributes such as expression.

Results

Different Baselines (Including SDv1.5, SDXL and FLUX)

Different Datasets (CelebA-HQ and self-curated data)

Different Facial Attributes (e.g. hair style, expression, accessories, eye color)

Analysis

1. Multiple-attribute Prompt Consistency

Italian Trulli

We validate that the subsequent enhancement processes of FreeCure do not impact attributes that have already been enhanced. The figure visualizes the intermediate results at FreeCure's different stages. When APG is used in the latter stages, attributes previously enhanced through FASA remain unaffected. This illustrates that FreeCure's improvement across multiple attributes is highly robust and consistent.

2. Different Initial Noise

Italian Trulli

We observe that, with identical initial noise, all baselines' FD and PD processes generate faces with similar attribute locations. It is important to validate that FASA's robust performance without these condition. Thus, we relax this condition and regenerate faces with different initial noises. This figure shows that results under the two settings are comparable, which confirms that FASA can effectively enhance the generated results of PD, even when its FD counterpart produces faces with variable spatial structures.

3. Visualization of FASA Modules

Italian Trulli

FASA effectively captures and faithfully transfers information of attributes localized in small areas (e.g. eyes, pearl earrings). Furthermore, in regions unrelated to the target attributes, FASA maintains a strong alignment with the original PD attention maps, demonstrating that it preserves the core functionality of the personalization models.