Research On Speech Emotion Recognition Based On Deep Learning

Posted on:2022-06-22

Degree:Master

Type:Thesis

Country:China

Candidate:H N Xu

Full Text:PDF

GTID:2518306533495414

Subject:Electronic information

Abstract/Summary:

PDF Full Text Request

With the development of deep learning(DL)and artificial intelligence(AI),emotion express become more and more important in the field of human-computer interaction,and speech,as the most direct way to express emotion,is an important prerequisite to achieve the natural human-computer interaction.How to automatically recognize human emotion by computer and to automatically extract the key features to represent speech emotion by deep learning are hot topics in today’s research.In this paper,we construct a model which is used to extract feature and recognize emotion of speech signal,based on the current popular deep learning network,focusing on finding the high-level emotional features that effectively represent the speaker’s emotions and simulating human attention mechanism to recognize emotion.The main tasks are as follows:(1)Aiming at the problems of single feature extraction and low classification accuracy in speech emotion recognition(SER)task,an emotion recognition method based on time-frequency feature fusion is proposed.In this paper,the 3-D Log-Mel feature set synthesized by the Log-Mel features,the first-order differential and the second-order differential features is taken as the input of the BCNN-LSTM-attention network to extract the frequency domain features,and the speech signal is divided into equal length segments and inputted into the CNN-LSTM network to obtain the time domain features.The frequency domain and time domain features are fused.The experiments on IEMOCAP and EMO-DB databases show that the recognition rate of multi-feature fusion algorithm was higher than that of extracting the single frequency domain features or time domain features algorithm.(2)The 3-D Log-MEL feature set extracted in(1)is retained.A speech emotion recognition algorithm based on the spatio-temporal features of self-attention is proposed in this paper to moodel the key spatio-temporal dependencies.The optimal spatio-temporal representation of speech signals are automatically learned by Bilinear Convolution Neural Network(BCNN)and Long Short-Term Memory Network(LSTM)and the multi-head attention mechanism is introduced to explore the key frame information.The experiments on IEMOCAP and EMO-DB databases show that the recognition rate of spatio-temporal feature fusion algorithm is higher than that of extracting single spatial or temporal features algorithm,and the multi-head attention mechanism improves the performance of the whole system.(3)An online speech emotion recognition system based on the spatio-temporal features of self-attention has been designed.All functional modules are realized by calling EXE executable files.The experimental results prove the superiority of the algorithm and the effectiveness of the speech emotion recognition system.

Keywords/Search Tags:

Speech Emotion Recognition, Multi-features Fusion, Spatio-Temporal Modeling, Attention Mechanism, Emotion Recognition System

PDF Full Text Request

Related items

1	Research On Feature Fusion Method Of Speech Emotion Recognition Based On Deep Learning
2	Research On Speech Emotion Recognition Method Based On Multi-feature And Multi-modal Fusion
3	Bi-Modal Emotion Recognition Based On Speech And Visual Cues
4	Research Of Speech Emotion Recognition Based On Deep Spatio-Temporal Representation
5	Research On Speech Emotion Recognition Based On Multi Features Fusion
6	Research On Speech Emotion Recognition Technology Based On Context Feature Fusion
7	Study On Attention Based Speech Emotion Recognition
8	Research On Speech Emotion Recognition Based On Spectrogram And Statistical Features
9	Research On Key Techniques Of Speech Emotion Recognition
10	Research On Speech Emotion Recognition Technology Based On Attention Mechanism