The Study Of Multimodal Emotion Recognition Based On Text,Speech And Video

Posted on:2020-10-04

Degree:Master

Type:Thesis

Country:China

Candidate:X J Song

Full Text:PDF

GTID:2428330572984006

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

With the continuous development of artificial intelligence technology,as an important branch of the emotional computing field,emotion recognition has become a research hotspot.Because of the shortcomings of low recognition rate and poor robustness,single-modal emotion recognition has gradually shifted from single-modal emotion recognition to multimodal emotion recognition.By introducing more modal information,the complementary information between the modalities is captured,thereby improving the final recognition effect.How to effectively fuse different modal information is the key to multimodal emotion recognition,and it is also a major dilemma in multimodal emotion recognition.This paper mainly studies the multimodal emotion recognition based on the combination of three modalities of text,speech and video on the basis of feature layer fusion,and explores and improves the key techniques of multimodal emotion recognition.The main research contents of this thesis are:(1)The study applies to a feature extraction method that is effective on a single modality.For the text modality,the bidirectional LSTM network is mainly used to extract the text sentiment features,and the contextual semantics and word order information of the text are effectively utilized,so that the extracted text sentiment features contain important time series information;for the speech modality,the convolutional neural network is used for speech feature extraction,and the open source tool openSMILE is used to extract the basic features of the speech signal.The two are combined as the final speech emotion feature to make the speech features more complete.For the video modality,the three-dimensional convolutional neural network model is used to extract the video emotional features.Compared with the common convolutional neural network model,the time dimension is introduced,so that the extracted emotional features contain rich front and rear time series information,and the key-point features of face are introduced as auxiliary features,so that the extracted video emotional features are more abundant and effective.(2)Study the fusion of multimodal emotional features.The feature fusion mode of direct cascade is widely analyzed in detail,and its existing problems are proposed.The shortcomings of the direct cascade fusion method are improved.Finally,the feature layer fusion method based on attention mechanism is proposed.The modification method makes the characteristics of a single modality learn a weight that accords with the distribution of the data set,and then uses the weighted fusion when performing feature fusion,so that the features obtained by the fusion are more effective,thereby improving the recognition effect.(3)Based on the feature layer fusion method based on attention mechanism,the feature layer fusion method of introducing residual thought is proposed.The direct optimizing of mapping function is transformed into optimizing the residual,so that the mapping of introducing the residual is more sensitive to the change of the output,thereby better optimizing the network structure,making the network structure more expressive,and further improving the final emotion recognition effect.(4)Applying the proposed fusion method based on the attention mechanism and the introduction of the residual thought to the multimodal emotion recognition task,and conducting experimental verification on the public data set,analyzing and discussing through the experimental results,and drawing us the effectiveness of the proposed feature fusion algorithm.

Keywords/Search Tags:

multimodal emotion recognition, feature layer fusion, attention mechanism, residual

PDF Full Text Request

Related items

1	Based On Multimodal Feature Emotion Recognition Research
2	Design Of Emotion Recognition System Based On Multimodal Feature Fusion
3	Research On Emotion Recognition Based On Speech And Facial Expression
4	Research On Multi-modal Emotion Recognition Based On UDP-MIF
5	Multimodal Emotion Recognition Based On Audio And Video
6	Research On Video Character Emotion Recognition Algorithm Based On Multimodal Feature Fusion
7	Multimodal Emotion Recognition Based On Multi-Scale Feature Fusion
8	Research On Video Emotion Perception Based On Visual And Text Information
9	Research And Design Of Speech And Text Fusion Multimodal Emotion Recognition Scheme Based On Deep Learning
10	Emotion Recognition Based On Multimodal Feature Disentanglement Learning