Font Size: a A A

The Research Of Multimodality Emotion Recognition

Posted on:2021-07-28Degree:MasterType:Thesis
Country:ChinaCandidate:T X LuoFull Text:PDF
GTID:2518306050966269Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
As an important part of human expression and communication,emotion is of great research significance and application value to make computer understand and recognize human emotion.The research of emotion recognition is mainly divided into single-modal emotion recognition and multi-modal emotion recognition.According to the types of emotion recognition,it can be divided into two types: discrete emotion recognition and continuous emotion recognition.Discrete emotion recognition regards emotion category as recognition target and ignores the complexity and granularity of emotion,while continuous emotion recognition effectively overcomes the shortcomings of discrete emotion recognition and makes emotion recognition more accurate by mapping emotion intensity value to specific emotion attribute by establishing emotion space.Because of the modal limitations of singlemode emotion recognition,we cannot make full use of the complementarity between the modes,so we choose to use multi-mode emotion features for continuous emotion recognition.The main contents of this paper are as follows.1.In this paper,human emotion is mainly expressed through two medium,audio and video.It is determined to select audio and video for multi-modal emotion recognition.At the same time,combined with the advantages and disadvantages of the common fusion strategy methods in multi-modal emotion recognition,a new hybrid fusion prediction method is proposed.Firstly,we carry out feature level fusion regression prediction for audio and video,and then carry out multi-modal decision level fusion regression prediction for prediction sequence,The hybrid fusion prediction method not only overcomes the limitations of singlemodal prediction,but also makes good use of the complementarity between different modes,and effectively improves the accuracy of the overall emotional prediction.2.In this paper,Aiming at the problems of poor prediction accuracy and inability to predict high-dimensional features in common models of audio and video emotion recognition,this paper proposes to use the deep Bi LSTM model for single-modal continuous emotion regression prediction,making full use of the excellent learning ability of recurrent neural network,as well as the continuity and context correlation of audio and video single-modal emotion,The experimental results show that the network model has high accuracy in singlemodal emotional regression prediction.3.In view of the problems of large prediction deviation and poor stability in decision level fusion strategy,this paper proposes an extreme gradient boost algorithm model of local absolute error based on attention mechanism,Attention-LAE-XGBoost.As a decision level fusion method of multimodal emotional regression prediction,it makes full use of the ability of local attention mechanism to focus on locality and excellent generalization performance of XGBoost model,Experimental results show that the model has high accuracy and robustness,and the CCC indexes in the emotional dimension space of Arousal and Valence reach 93.42% and 93.78% respectively.4.In this paper,an audio and video based emotion recognition system is designed,which integrates the key points of continuous emotion recognition research,single-modal emotion regression prediction model,multi-modal fusion strategy and Attention-LAE-XGBoost fusion regression prediction model.The model and method proposed in this paper are embedded in the system,and real-time video and external mounted file are collected,The validity and practicability of the emotion recognition method proposed in this paper are verified by testing in the video.
Keywords/Search Tags:Continuous emotion recognition, Multimodal, Deep BiLSTM, Attention mechanism, XGBoost
PDF Full Text Request
Related items