Font Size: a A A

Research On Emotion Recognition Method Based On Multimodal Deep Learning

Posted on:2022-03-11Degree:MasterType:Thesis
Country:ChinaCandidate:C ChengFull Text:PDF
GTID:2518306494453774Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the booming development of artificial intelligence,the use of computers to recognize human emotions has rapidly developed into an interdisciplinary research field.The expression of emotion is a process of the interaction of multimodal information.Therefore,the fusion of information features of multiple modal signals is gradually developed in the field of emotion recognition.However,EEG signals can reflect human emotions more objectively and truthfully,combining EEG signals with other modal signals and using deep learning technology to determine emotional states has become a hot topic of current research.Nonetheless,there are still some problems in the current research: the non-linear and non-stationary characteristics of physiological signals make it difficult to mine and extract the deep features of the signals;for multimodal signals,there is a lack of effective methods to fully express the complex internal relationships between the component modals;combining and modeling multimodal data make emotion recognition systems more complex and so on.This paper conducts research on feature extraction and feature fusion in the process of multimodal deep emotion recognition,proposing two multimodal emotion recognition methods.The main research contents are as follows:(1)Aiming at the problem of lack of multiscale features and insufficient expression of key features when extracting multimodal signal features,proposes a multimodal emotion recognition method based on hierarchical fusion convolutional neural network.First,based on the early fusion of emotion recognition,proposes a hierarchical fusion feature extraction method on the basis of traditional neural network.A hierarchical network structure is constructed by setting different core sizes and numbers on each convolutional layer,forming a layered incremental architecture inside the convolutional neural network.Then,the constructed hierarchical fusion convolutional neural network is used to extract the hierarchical local convolution features,and fused with the weights to form a global feature vector.Finally,the statistical features extracted manually are fused to realize the classification and judgement of sample emotional labels.The binary classification task of arousal and valence dimension,the classification results of DEAP dataset are 83.28% and 84.71%,and the classification results of MAHNOB-HCI dataset are 88.28% and 89.00%.(2)Aiming at the problem that multimodal signals can not make full use of the complementary features among multiple modals and contain a lot of redundant information in feature fusion,proposing a deep emotion recognition method based on multimodal factorized bilinear pooling.First,preprocessing and selecting the full-channel EEG signal and filtering out four bands.Then using the deep convolutional neural network and multimodal factorized bilinear pooling to automatically extract and fuse the various pre-processed signals,forming multimodal features of four different bands.Finally,using the ensemble classifier to further explore the classification effect that can be achieved by the combination of different bands of EEG signals.Performing a binary classification task on the arousal and valence of DEAP and MAHNOB-HCI datasets,and adding eye movement signals of MAHNOB-HCI dataset to further verify the effectiveness of the method.The experimental results show that the multimodal ensemble with theta band can achieve better results.The best result is 93.22% for arousal,and 90.50% for valence.
Keywords/Search Tags:Multimodal emotion recognition, Deep learning, Feature extraction, Multimodal factorized bilinear pooling
PDF Full Text Request
Related items