Font Size: a A A

Research On Visual-Audio Multimodal Shared And Transfer Emotion Feature Learning Methods

Posted on:2018-05-01Degree:MasterType:Thesis
Country:ChinaCandidate:J M FuFull Text:PDF
GTID:2348330533959266Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The emotion states play a vital role in people's daily interaction.Rich emotion states help people express their thoughts.Therefore,the research of emotional analysis is very important.Visual and audio are the most direct and effective way of human emotion expression,and they are also two important information modalities in the field of affective computing.However,emotion recognition of single modality is sometimes flawed in practical application.The research of multimodal emotion recognition is becoming more and more important.The difficulty of multimodal emotion recognition is not only to retain the original characteristics of single modality,but also to tap the complementary information of two modalities.Finally we obtain the multimodal shared emotion feature which is beneficial for emotion recognition.However,modality absence is also a common and difficult problem in real life.The way to reconstruct the missing modality using single modality information is another difficult problem.Considering these two problems,this paper proposes a multimodal visual-audio shared emotion feature learning method and a multimodal visual-audio trasfer emotion feature learning method.The main contents and innovations of the article are as follows:(1)The multimodal shared emotion feature learning method based on local,sparse and discriminative canonical correlation analysis is proposed.The main purpose of this study is to obtain the shared emotional features,which contain both the emotion information of single modality and the complementary features of two modalities.The method mainly includes three stages.The first stage is the single modality high-level emotion feature learning.Firstly,the audio data extracted from the audio-visual samples are preprocessed to obtain the spectrogram,and obtain the video image sequences,then to use the sparse autoencoder to extract high-level features of the video and audio.The second stage is the multimodal shared emotional feature learning.The extracted video and audio features are used for input.We form the multimodal shared feature using the local,sparse and discriminative canonical correlation analysis method En-SLDCCA.The third stage is SVM classifier training.The multimodal shared feature is the input of SVM classifier for emotion recognition.(2)The multimodal transfer emotion feature learning method based on visual and audio is proposed.The main purpose is to solve the problem of modality absence in multimodal emotion recognition.The method mainly includes four stages.The first stage is the single modalityhigh-level emotion feature learning.Firstly,the video and audio are preprocessed,spectrogram and the key frames of the video sequences are extracted respectively.Then,the high-level features of images and speech are obtained by using the sparse autoencoder.The second step is the transfer learning of emotion feature.It mainly includes two steps.The first step is to the learning of transfer function between modalities.First,we use the normalized correlation analysis method NCCA to project the visual-audio feature to a shared space,and to learn the transfer function in the space.We can obtain the estimated feature of the missing modality.The second step is the emotional feature reconstruction of the missing modality.We use the estimated feature to learn the reconstructed feature of missing modality by using the reconstructed feature learning method,so that the learned feature includes the original modality information related to emotion and it retains the related information of the other modality.The third stage is multimodal shared feature learning.The high-level feature of the original modality and the high-level emotion feature of the reconstructed modality are fused by the En-SLDCCA method to obtain the multimodal shared feature which is beneficial to emotion recognition.The third stage is SVM classifier training,the shared feature is the input of SVM classifier for training,and finally it is used for emotion recognition.(3)We design and implement the prototype system of multimodal visual-audio emotion recognition.The MATLAB and C ++ are used to realize the prototype system of multimodal visual-audio emotion recognition.In the paper,the module of multimodal shared emotion feature learning method based on local,sparse and discriminative canonical correlation analysis and the module of multimodal transfer emotional feature learning method are implemented in the prototype system.The availability of the proposed methods are verified by the the prototype system.
Keywords/Search Tags:Multimodal emotion recognition, Sparse autoencoder, Shared emotion feature learning, Transfer learning, Transfer emotion feature learning
PDF Full Text Request
Related items