Font Size: a A A

Human-Machine Emotional Dialogue System Based On Virtual Talking Head

Posted on:2022-04-18Degree:MasterType:Thesis
Country:ChinaCandidate:J T ChenFull Text:PDF
GTID:2518306494486964Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of computer technology and manufacturing technology,the applications of intelligent service robots are constantly expanding,including medical treatment,education or entertainment fields.Humans are not satisfied with the traditional command-based human-computer interaction mode based on keyboard and mouse,and gradually shift their attention to anthropomorphic humancomputer interaction such as dialogue.Researchers found that the intelligent service robot lacks emotional expression,and it is difficult for robots to achieve emotional resonance with users,making the user to gradually lose interest in talking with the robot.Therefore,if intelligent service robots want to maintain sales growth and ensure that they can truly bring convenience to users,the first thing to be solved is to design a reasonable way of emotional interaction between intelligent service robots and users.At present,most intelligent service robots use a screen to display information to interact with users.For this kind of interaction window,researchers propose two types of methods to achieve anthropomorphic emotional interaction between robots and users.The first is to use a established three-dimensional virtual human talking head model to generate anthropomorphic emotional dialogue animation to interact with the user,and the second is to generate a dialogue video based on a target character picture to interact with the user.Even though these two methods can achieve a certain accuracy in generating anthropomorphic mouth movements,they have not yet satisfied users in the generated expressions.For example,the generated expressions are not obvious in the eyebrows and eye areas.In addition,the resolution of the video generated in the second method is extremely low,which is difficult to meet the needs of users.In response to the above problems,this dissertation combines the above two methods in a limited way,and uses an end-to-end method to generate driving data with emotional information based on input speech data and emotional tags,driving the three-dimensional virtual talking head to deform to form an animation of anthropomorphic emotional dialogue.The present study includes the following three parts:1.A long short-term memory network(LSTM)based facial landmarks sequence data with emotional information generation method was proposed.In this method,speech data and emotion tag are used as input,and the LSTM model is used to capture the relationship between input features and facial poses and output facial landmarks sequence data with emotional expression.This method can achieve the root mean square error of 0.12 on the multi-modal emotional dialogue database used in this dissertation,which means that the error on the real face scale is only 0.12 cm.This result is close to the current state-of-the-art technology based on the generative adversarial network with an RMSE of 0.11.2.A generative adversarial network(GAN)based facial landmarks sequence data with emotional information generation method was proposed.In order to solve the problem that the facial expressions generated by the prior art do not perform well in the eyebrow area,our method splits the facial area into expression-independent areas,eyebrows areas and mouth areas based on expression actions,and the three areas are designed separately with encoding-decoding structure to realize the independent modeling of each facial area.The outputs of the three structure blocks will be merged and be fed into the shared fully connected structure,and the overall dependence of the data of each region is modeled in a higher dimension.Compared with the baseline model based on the recurrent neural network,the method proposed in this dissertation has a relative improvement of 17% in generation effect,and also has a relative improvement of 9% compared with the state-of-the-art technology based on generative adversarial network.3.Establishing a human-machine emotional dialogue system.A human-machine emotional dialogue system is established based on the method of generating facial landmarks sequence data in this dissertation.The multi-modal information recognition module is used to identify the user's dialogue content and emotional state,and the intent analysis module selects the appropriate response audio and expression tag according to the user's dialogue content.Then multi-model interaction module generates emotional facial landmarks sequence data based on the reply audio and expression tags.Finally,the generated facial landmarks sequence data will drive the 3D virtual talking head to form an anthropomorphic dialogue animation under the framework of the free deformation algorithm.
Keywords/Search Tags:3D Virtual Talking Head, Long Short-Term Memory Networks, Generative Adversarial Networks, Human-Machine Emotional Dialogue System
PDF Full Text Request
Related items