Font Size: a A A

Research On Multimodal Emotion Recognition And Human-computer Interaction In Virtual Environment

Posted on:2022-08-22Degree:MasterType:Thesis
Country:ChinaCandidate:J G DongFull Text:PDF
GTID:2518306575964699Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Multimodal emotion recognition is a hot and challenging research field in artificial intelligence.It mainly improves the performance of emotion recognition by fusing multimodal emotion information.The main difficulty is whether it can learn more discriminative unimodal emotional information and whether it can fully mine multi-modal complementary information through multi-modal fusion method.Speech and facial expressions are two of the most natural and effective ways for hunman beings to express emotions.This thesis studies multimodal emotion recognition based on speech and expression and its natural interaction in virtual environment.It mainly includes unimodal emotion feature learning,multimodal emotion fusion and humancomputer interaction in virtual environment.Specific research contents are as follows:1.From the perspective of unimodal emotional feature learning,this thesis combines Bidirectional Long Short-Term Memory(Bi LSTM)and Convolutional Neural Network(CNN)for speech emotion recognition to learn context related information and local highlevel features in speech signal.For facial expression recognition,a neural network model method based on small scale convolution kernel is proposed in this thesis.By using small scale convolution kernel to replace large scale convolution kernel,the network layer number is deepened and the nonlinear expression ability is enhanced to learn the local highlevel features of facial expression.Experimental results on the IEMOCAP dataset show the recognition rate speech emotion and expression recognition reached 58.97% and 60.19%,respectively.2.In the aspect of multi-modal emotion fusion,this thesis first uses feature level fusion method to fuse speech emotion features and facial expression features.And then,this thesis proposes a model level fusion method based on neural network.After the fusion of speech emotion features and expression features,the complementary information between the fused features is learned by neural network.The experimental results on IEMOCAP dataset show that the recognition rate of the model level fusion method reaches 70.24%,which shows that the method is effective.3.Finally,this thesis applys the multimodal emotion recognition algorithm to the virtual environment interaction system to verify the effectiveness of the multimodal emotion recognition algorithm in the real scene.Through multiple comparative experiments,the virtual environment interaction system can correctly recognize the user's emotions.Virtual characters can also make corresponding interactive actions.
Keywords/Search Tags:virtual environment interaction, speech emotion recognition, multimodal fusion, deep learning
PDF Full Text Request
Related items