Multi-modal Speech Emotion Recognition Based On The Attention Mechanism

Posted on:2021-11-02

Degree:Master

Type:Thesis

Country:China

Candidate:C Z Wei

Full Text:PDF

GTID:2518306128453664

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

The study of emotion recognition has a very important significance for humancomputer interaction.The expression of emotion usually involves multiple modalities such as speech,text and facial micro-expressions.The construction of emotional speech databases is difficult,therefore,the data enhancement method for speech emotion corpus is of great significance.In recent years,researchers have begun to try to introduce other modal information into the research of speech emotion recognition,it is particularly important to better mine the relevant information between the various modalities.This dissertation focuses on the two aspects of data enhancement and multimodal feature fusion,and builds model frameworks based on neural networks and the attention mechanisms.The main work of this dissertation is as follows:(1)Based on the spectrogram data,a convolutional neural network is used to learn spat ial informat ion,and a lo ng-short-term memory network is used to learn the inherent time series informat ion of speech signals in order to balance the relat ionship between the two,a time-series segmentatio n convolut io nal neural network model is proposed.At the same time,based on the localized characterist ics of speech sent iment distributio n,a method for enhancing the spectrogram data based on local flipping is proposed.(2)For the problem of multi-modal fusion,the attention mechanism is used to learn the internal correlat ion information between the modalit ies,and then a unidirect io nal cross fusio n model,a bidirect ional cross fusio n model,a cross fusio n model and a full fusio n model based on the attent ion mechanism are proposed.Through experimental analysis,the data enhancement method and hybrid neural network proposed in this dissertation can improve the performance of speech emotion recognition.The proposed multi-modal fusion method is more effective than the traditional fusion method,and compared with existing methods based on the attention mechanism,it can mined deeper-level emotional information between the modalities.

Keywords/Search Tags:

Speech emotional recognition, The attention mechanism, Multi-modal, Data enhancement

PDF Full Text Request

Related items

1	Speech Emotional Recognition Research Based On Features Extraction And Multi-modal Combination
2	Research On Speech Emotion Recognition Method Based On Multi-feature And Multi-modal Fusion
3	Speech Emotion Recognition Based On Deep Learning
4	Research On Emotion Recognition Of Monomodal Speech And Multimodal Speech Vision Based On Transfer Learning
5	Speech Emotion Recognition Model Based On 3D Attention Mechanism And Center Loss
6	Research On Speech Enhancement Algorithm Based On Multi-head Attention Mechanism
7	Research On Single Channel Speech Enhancement Based On Multi-head Attention Mechanism
8	Study On Cross-modal Speech Recognition Methods With Fusion Lipreading
9	Speech Emotion Recognition Based On Neural Network And Attention Mechanism
10	Research On Key Techniques Of Speech Emotion Recognition