Font Size: a A A

Emotion Recognition Based On Un-aligned Multimodal Framework

Posted on:2022-12-05Degree:MasterType:Thesis
Country:ChinaCandidate:K R WangFull Text:PDF
GTID:2518306767977519Subject:Computer Software and Application of Computer
Abstract/Summary:PDF Full Text Request
At present,single modal emotion recognition technology has been widely used in many fields,such as facial expression recognition and speech emotion recognition,etc.However,due to the single modal state,its single modal feature information sometimes appears insufficient data or seriously affected by the outside world,the multi-modal method is being paid more and more attention because of its diversity of modal and the complementarity of features between modals.Although the framework strategy of multi-modal fusion has made up for the deficiency of single modal to a certain extent,because different modal have certain heterogeneity and similarity,it is very important to select the fusion structure which is suitable for different modal fusion without increasing feature redundancy;and in order to establish the relationship between different modal features,multi-modal structures often use the alignment information between modal,and ignore the asynchronous situation between different modal.Based on this,in view of the asynchronous situation of multi-modal features,this paper combines the attention mechanism with transformer mechanism,and uses selfattention mechanism to form a cross-modal attention module,so that each feature modal can have a closer relationship without need of alignment.In addition,when dealing with text features,the paragraph vector is embedded in the text feature and applied to the corresponding depression detection experiment.Secondly,in order to solve the limitation of the experimental data shortage of some modal,this paper applies GAN mechanism to the experiment to achieve the purpose of feature enhancement.Combining GAN mechanism with attention mechanism,the depression degree is tested under the condition of multimodal features.This paper conducts experiments on the IEMOCAP dataset,AVEC2017 dataset and AVEC2019 dataset,and finally uses Bo VW,open SMILE,ASR and paragraph vector embedding to process video,audio and text features respectively,and the emotion recognition and depression prediction were performed by using the un-aligned cross-modal attention modal,both experiments have achieved good experimental results,it fully shows that the un-aligned cross-modal attention module proposed in this paper has good performance.
Keywords/Search Tags:Un-aligned Features, Cross-modal Attention, GAN, Multi-modal Emotion Recognition, Depression Detection
PDF Full Text Request
Related items