Font Size: a A A

Research And Implementation Of Audio-Driven Personalized 3D Virtual Face Synthesis Algorithm

Posted on:2024-07-30Degree:MasterType:Thesis
Country:ChinaCandidate:X GuoFull Text:PDF
GTID:2568306944459504Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Audio-driven 3D reconstruction of virtual faces aims to generate a digital representation of a speaking face and synchronize its lip movements,facial expressions,etc.with speech.The technology is used in areas such as digital avatar creation,animating characters in movies and games,and generating lip-synchronized videos.Audio-driven virtual face generation has been widely explored,but still faces many difficulties and challenges in personalized synthesis and emotional representation of faces.On the one hand,there is a lack of professional audio-video aligned 3D emotion datasets,and traditional methods that rely on facial motion capture systems to build datasets are costly,long-period,and difficult to construct;on the other hand,there is a lack of well-designed end-to-end generation models,leading previous of virtual face generation methods directly ignore the generation of expressions,or the synthesized virtual face expressions do not correspond and are not obvious with the given audio emotions,resulting in the Valley of Terror effect.In order to solve the above challenges and dilemmas,this thesis proposes a new method for constructing 3D face emotion datasets,which provides the possibility of constructing personalized and diversified virtual face datasets;based on the constructed 3D emotion datasets,this thesis proposes an end-to-end generation model to enhance the realism of virtual faces.The main work and research contents of this paper are as follows:(1)This thesis proposes a new method to construct a facial 3D emotion dataset,and algorithmically eliminates the head offset independent of the speech signal,and finally constructs a virtual face data with 12 speakers,8 emotions,and more than 400,000 frames in total.Based on this construction method,this thesis proposes an automated process for personalized dataset construction,which provides a basis for building your own virtual face.(2)Based on the dataset,this thesis takes the discrete emotion model as the basis,fully considers the independence of speech features of different emotions and their mapping relationship with the 3D virtual face,so that the generated virtual face model can fully reflect the different emotional tendencies of the discourse;this thesis proposes a new end-toend generation network based on the structural principle of human expressions in anatomy,using structured loss function to refine the synthesis of different regions of the virtual face such as eyes,lips,and nose,which improves the emotional expressiveness of the virtual face and makes up for the lack of emotion synthesis ability of industry models.(3)This thesis constructs a virtual face application demonstration,and makes a preliminary exploration and planning for the future development and application of virtual faces.
Keywords/Search Tags:Virtual faces, Personalization, Audio-driven, 3D reconstruction
PDF Full Text Request
Related items