Font Size: a A A

Design Of Emotion Recognition System Based On Multimodal Feature Fusion

Posted on:2022-12-14Degree:MasterType:Thesis
Country:ChinaCandidate:W ChenFull Text:PDF
GTID:2518306752493474Subject:Computer Software and Application of Computer
Abstract/Summary:PDF Full Text Request
With the rapid development of artificial intelligence,the use of computers to perceive various emotions like humans has become a hot research field.As an emerging interdisciplinary direction,multimodal emotion recognition can improve the effect of emotion recognition by utilizing the interaction information between different modalities while obtaining information from a single modality.However,there are many problems in current multimodal emotion recognition: insufficient utilization of modal features,and information redundancy or conflict caused by different modal features.Starting from improving the recognition performance of the emotion recognition model,this paper improves the multi-modal fusion mechanism,and conducts a beneficial exploration on the design of emotion recognition system based on multi-modal feature fusion.The specific work is as follows:(1)For single-modal feature extraction,this paper takes into account the characteristics of different modal data,analyzes the characteristics of different neural networks,and uses long and short-term memory networks(LSTM)to extract contextual information for text and acoustic modalities with complex time-varying features.The features of the visual modality are multi-scale convolutional neural network(MSCNN)for visual modalities to extract low-level features from images.(2)For multi-modal feature fusion,firstly,reconstruct the data set used,redistribute emotion categories,balance the proportion of each category of emotion,and achieve multi-modal alignment;In the modal stage,the cross-modal interaction of the three modalities is realized by using the cross-modal attention mechanism,and the low-level features in the source modality are used to enhance the target modality features.The cross-modal attention mechanism is embedded into the improved Transformer network,which reduces the complexity of the network model.The multi-head attention mechanism is used to enhance the representation ability of the network and improve the effect of modal fusion.(3)Based on the proposed algorithm model,this paper designs and implements a multi-modal emotion recognition system,which can display the effect more intuitively.The system implements emotion analysis on offline video clips,and at the same time,it can obtain audio and video data by calling the user's camera and microphone,and use the trained emotion recognition model to realize the function of real-time multimodal emotion recognition.This paper conducts experiments on the proposed algorithm models on different datasets,and compares representative and state-of-the-art models.The model proposed in this study achieved an accuracy of 84.1% and an F1 score of 82.9% on the IEMOCAP dataset,and achieved a recognition rate of 82.7% and 82.4% on the CMU-MOSI and CMU-MOSEI datasets,respectively.The rate is improved by3%-5%,which proves that the model proposed in this study has good performance.
Keywords/Search Tags:Multimodal emotion recognition, Attention mechanism, Feature fusion, Transformer network
PDF Full Text Request
Related items