Font Size: a A A

Multi-modal Emotion Recognition Based On Deep Learning

Posted on:2021-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhangFull Text:PDF
GTID:2428330611498052Subject:Microelectronics and Solid State Electronics
Abstract/Summary:PDF Full Text Request
The research of emotion recognition is an important branch in the field of emotion computing.With the continuous development of artificial intelligence technology,human-computer interaction experience is constantly pursuing more humanization and intelligence.Emotion recognition has become a research hotspot.Single mode emotion recognition often has the disadvantages of incomplete information,strong interference and low recognition rate.Recently,multi-mode emotion recognition has been widely used in many fields It has been widely concerned by researchers,and a lot of research work has been carried out in the field of speech,video,text and physiological signal emotion recognition.Multimodal emotion recognition can complement each other by fusing the information between different modes,so as to improve the final recognition rate.In this paper,we build a multi-modal emotion recognition model based on voice,video and text.This paper studies the effective feature extraction methods for speech,video and text.For speech information input,this paper uses long-term and short-term memory neural network(LSTM)for speech feature extraction.Because the output of each time of speech signal is related to the front and back time,this network can make better use of the information of the front and back time of speech signal;for visual For the input of frequency information,this paper uses a dense connected convolutional neural network(DenseNet)to extract image features,which breaks away from the fixed thinking of deepening network layers(ResNet)and widening network structure(perception)to improve network performance.From the perspective of features,through feature reuse and bypass setting,the network parameters are greatly reduced,To a certain extent,it alleviates the problem of gradient disappearance;for text feature extraction,this paper uses LSTM neural network,which can effectively extract emotional semantic and word order information.In order to fuse the information of three modes effectively,this paper studies the fusion method of multi-modal emotion recognition.Among them,the fusion method based on feature layer can effectively use the information of each mode,but the direct cascade feature layer fusion method only splices the output emotion feature vectors of each mode.In this paper,attention mechanism is introduced into feature layer fusion The mechanism obtains a reasonable weight according to the data set distribution through learning,and adds the weight in the final feature fusion,making the multi-modal emotion recognition results more accurate.In this paper,the single-mode,dual-mode and multi-mode comparative tests are designed,and five kinds of output,four kinds of output,three kinds of output and two kinds of output are carried out for ten kinds of emotion classification in the data set of IEMOCAP.The experimental results show that the accuracy of bimodal emotion recognition is 6.2%higher than that of single-mode emotion recognition,and the accuracy of multimodal emotion recognition is 8.98%higher than that of bimodal emotion recognition.The design verifies the effectiveness of multimodal emotion recognition.
Keywords/Search Tags:multimodal emotion recognition, deep learning, feature level fusion, attention mechanism
PDF Full Text Request
Related items