Font Size: a A A

Multi-modal Speech Emotion Recognition Based On The Attention Mechanism

Posted on:2021-11-02Degree:MasterType:Thesis
Country:ChinaCandidate:C Z WeiFull Text:PDF
GTID:2518306128453664Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The study of emotion recognition has a very important significance for humancomputer interaction.The expression of emotion usually involves multiple modalities such as speech,text and facial micro-expressions.The construction of emotional speech databases is difficult,therefore,the data enhancement method for speech emotion corpus is of great significance.In recent years,researchers have begun to try to introduce other modal information into the research of speech emotion recognition,it is particularly important to better mine the relevant information between the various modalities.This dissertation focuses on the two aspects of data enhancement and multimodal feature fusion,and builds model frameworks based on neural networks and the attention mechanisms.The main work of this dissertation is as follows:(1)Based on the spectrogram data,a convolutional neural network is used to learn spat ial informat ion,and a lo ng-short-term memory network is used to learn the inherent time series informat ion of speech signals in order to balance the relat ionship between the two,a time-series segmentatio n convolut io nal neural network model is proposed.At the same time,based on the localized characterist ics of speech sent iment distributio n,a method for enhancing the spectrogram data based on local flipping is proposed.(2)For the problem of multi-modal fusion,the attention mechanism is used to learn the internal correlat ion information between the modalit ies,and then a unidirect io nal cross fusio n model,a bidirect ional cross fusio n model,a cross fusio n model and a full fusio n model based on the attent ion mechanism are proposed.Through experimental analysis,the data enhancement method and hybrid neural network proposed in this dissertation can improve the performance of speech emotion recognition.The proposed multi-modal fusion method is more effective than the traditional fusion method,and compared with existing methods based on the attention mechanism,it can mined deeper-level emotional information between the modalities.
Keywords/Search Tags:Speech emotional recognition, The attention mechanism, Multi-modal, Data enhancement
PDF Full Text Request
Related items