| With the rise of the metaverse,virtual digital people become the core interactive carrier in immersive social media,attracting ordinary users to experience new forms of interaction.Due to the powerful social attributes of human face but complex construction mechanism,it is particularly important to accurately convey true facial emotion through virtual characters model in virtual scenes.At present,some systems rely on special facial capture equipment,are time-consuming and laborious to operate,which is difficult to popularize for ordinary users.This thesis studies the image-based expression interaction technology,recognizes facial feature information to drive virtual characters to generate expressions synchronously,and creates a virtual interactive system for ordinary users.Specific work is as follows:In order to solve the problem of unsteady capture caused by large deviation of landmarks positioning by traditional image recognition algorithms,a facial landmarks recognition algorithm based on two-stage neural network is designed.The first stage is input into the improved multi-task cascaded convolutional neural network,whose first subnet is used for feature fusion and the second one is introducing new convolution to realize face detection of different sizes.In the second stage,the detected faces are adjusted and input into the high-resolution network to accurately predict the location of facial landmarks from the output heat map.The results show that the algorithm can accurately locate facial landmarks in different facial expressions,the average error of recognition is significantly reduced and the algorithm is robust.In order to solve the problem that the existing feature extraction methods do not consider the facial action details and rigid posture factors,resulting in the generated virtual character’s facial expression is simple and dull.Therefore,a 3D virtual character expression generation algorithm is proposed.Firstly,facial mask is formed by dividing the face region according to the landmarks recognized.The semantic relation of facial Action Unit(AU)is analyzed to establish AU knowledge map.Secondly,the local detailed features of the mask are extracted and the semantic relationship of the map is mined.A flexible AU recognition network is designed to improve the recognition accuracy of AU activation degree and solve the problem of difficulty in capturing details due to complex expressions.At the same time,the head-eye coordination motion is introduced to achieve the rigid attitude estimation,and the rigid parameters are solved accurately.Finally,Blendshape technology is used to calculate the facial rigid-flexibility coefficient to drive the model to generate expressions.The results show that this algorithm can make the virtual model accurately generate rich natural expressions when people make complex facial movements.With the above algorithm embedded,a facial synchronous expression demonstration system based on Unity3 D is designed.From the function and performance test,the system can stably and accurately capture the facial action features and generate smooth virtual character expression effect,which can meet the real-time requirements and enable ordinary users to achieve virtual interactive experience. |