Visual Dubbing Network Architecture

Personalized Visual Dubbing Through Virtual Dubber And Full Head Reenactment

Bobae Jeon, Eric Paquette, Sudhir Mudur and Tiberiu Popa. Eurographics Short Paper, pages 1-4, 2025.

Abstract

Visual dubbing aims to modify facial expressions to ``lip-sync'' a new audio track. While person-generic talking head generation methods achieve expressive lip synchronization across arbitrary identities, they usually lack person-specific details and fail to generate high-quality results. Conversely, person-specific methods require extensive training. Our method combines the strengths of both methods by incorporating a virtual dubber, a person-generic talking head, as an intermediate representation. We then employ an autoencoder-based person-specific identity swapping network to transfer the actor identity, enabling full-head reenactment that includes hair, face, ears, and neck. This eliminates artifacts while ensuring temporal consistency. Our quantitative and qualitative evaluation demonstrate that our method achieves a superior balance between lip-sync accuracy and realistic facial reenactment.

Keywords

Visual dubbing, Reenactment, Style transfer

BibTeX entry

@inproceedings{Geon:2025:Dubbing,
  author = {Bobae Jeon and Eric Paquette and Sudhir Mudur and Tiberiu Popa},
    title = {Personalized Visual Dubbing through Virtual Dubber and Full Head Reenactment},
    booktitle={Eurographics Short Paper},
    pages={1-4},
    year = {2025}
}

Online version

Preliminary version of the paper.

Preliminary version of the supplementary material.

Additional material

Pre-print version of the video:


email