Font Size: a A A

The Multim Odal Emotion Recognition Research Based On IEMOCAP

Posted on:2022-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:J Y CaiFull Text:PDF
GTID:2518306521479964Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the history,emotions are everywhere.The doctor can analyze the severity of the pain through the emotion of the patient.Especially for patients with certain obstacles in speech expression,the expression of emotions will be the most important things.Today,with the growing of information technology and AI,the recognition of emotions are particularly important in the feedback of humancomputer interaction.The previous research on emotion recognition is a single analysis of recognition rate from the direction of speech,text,face,etc.However,this research has limitations in the future development of AI.It is impossible that humancomputer interaction captures a single information to perform related operations.Therefore,if a machine can perceive and understand emotions like humans,it must be able to simulate human capabilities in this respect,so that the machine can capture multi-modal emotional characteristics,and process them,and finally express.In the previous research center of emotion recognition,researchers choose script-style and patterned materials for emotion recognition,the degree of selection is often not comprehensive enough,which will lead to an incomplete emotion recognition rate.Therefore,this article has done exploratory research and positive experiments from the multi-modal emotion information database,to the single-modal emotion feature acquisition,to the single-modal emotion recognition,and finally to the multi-modal emotion recognition.Based on the IEMOCAP(interactive emotional dyadic motion capture)interactive emotional binary motion capture data set,the emotion recognition experiment is carried out with speech,text,and facial and head gesture information as blocks.The MFCC acoustic emotion features are extracted from the speech direction,and the neural network is used for modeling and recognition.The text direction is extracted by the word embedding feature vector of natural language processing,and then the LSTM long and short-term memory network is used to contact the context for recognition.The coordinate-based feature vector in the data set is used for experiments using convolution.In the course of the experiment,this article systematically reviewed some neural network emotion recognition algorithms,and retrospectively introduced the modal fusion method before modal fusion,and finally adopted the idea of feature fusion to perform multi-modal emotion recognition.Finished the final experiment and got the result.Finally,we mainly studies the voice emotion recognition based on MFCC acoustic features on the IEMOCAP dataset,the recognition rate reaches 59.53%,and the CNN emotion recognition based on the spectrogram is tried,and the recognition rate reaches 57.85%.For the text information in the data set,the word bag model was established through Word2 vec and Glove respectively,and then emotion recognition was performed through the neural network.The experimental recognition rates were 54.92%(glove)and 67.12(word2wec),respectively.For the face,head,and gesture information in the data set,all the feature values between the start time and the end time value are sampled,and the three parts(face,head,gesture)are recognized respectively,and the recognition results are respectively the face53.30%,head 45.98%,gesture 49.50%.At the same time,the three parts are initially fused and recognized,and the recognition rate is 51.31%,which is an improvement over previous studies.Finally,voice,text,face,head,and gestures are fully integrated to perform multi-modal emotion recognition.The recognition result reaches 71.55%,which is a great improvement over previous studies.
Keywords/Search Tags:LSTM, CNN, PFA, Multi-modal emotion recognition
PDF Full Text Request
Related items