본문 바로가기
  • Home

High-Fidelity Face Swap via Prompt-Driven Inpainting and Pixel-Level Background Preservation

  • Journal of The Korea Society of Computer and Information
  • Abbr : JKSCI
  • 2026, 31(6), pp.67~80
  • Publisher : The Korean Society Of Computer And Information
  • Research Area : Engineering > Computer Science
  • Received : March 23, 2026
  • Accepted : June 1, 2026
  • Published : June 30, 2026

Moonsung Kang 1 Jihoon Lee 1 Seungwon Jang 1 Suin Kim 1 Doheun Cha 1 Sangtae Ahn 1

1경북대학교

Accredited

ABSTRACT

In this paper, we propose a novel pipeline that integrates mask-weighted loss and a face-aware text adapter to address the unrealistic painterly textures and unstable text guidance inherent in existing latent diffusion models during high-resolution face swapping. To restore fine facial details and photorealistic textures, we first employ a fine-tuning strategy for the U-Net using a mask-weighted loss. While this optimization enhances visual fidelity, it often leads to a degradation of semantic information or unintended background distortions. To mitigate these issues, we introduce a face-aware text adapter that dynamically calibrates the intensity of text embeddings based on the spatial proportions of the facial region, ensuring robust semantic control. Furthermore, to circumvent the inherent background information loss caused by the variational autoencoder reconstruction process, we implement a pixel-level blending strategy that directly integrates the generated face with the original background in the pixel space. Experimental results demonstrate that our proposed model significantly outperforms baseline methods across key metrics, including FID, PSNR, LPIPS, and PickScore, successfully achieving both high-quality, prompt-driven face synthesis and perfect background preservation.

Citation status

* References for papers published after 2024 are currently being built.

This paper was written with support from the National Research Foundation of Korea.