Speech Emotion Recognition Based On Deep Learning

Posted on:2021-04-17

Degree:Master

Type:Thesis

Country:China

Candidate:M X Deng

Full Text:PDF

GTID:2428330614968329

Subject:Engineering

Abstract/Summary:

Speech emotion recognition(SER),which is one of the most challenging tasks in the field of speech technology,refers to the recognition of the speaker's emotional state through speech.With the wide application of speech interaction technology,the speech emotion recognition technology which can make the machine more humanized has a broad application prospect.In recent years,with the development of deep learning technology,there are many exciting progress in the field of speech emotion recognition.However,the current speech emotion recognition technology still faces many difficulties,such as the small amount of existing speech emotion data and the difficulty of data annotation,the difficulty of complete extraction of high latitude speech emotion features,the interference of deep learning model by semantic languages and other information,resulting in poor accuracy and cross cross language performance of speech emotion recognition.Therefore,in order to improve the accuracy of speech emotion recognition and cross language cross data set,the following work is carried out:First of all,based on the high-dimensional characteristics of emotional information,this paper proposes an emotional feature that conforms to the Gaussian distribution:Smoothing Real Spectrogram,solving the problem of zero value in the traditional spectrogram,removing the influence of abnormal and extreme values in the data,and improving the performance of speech emotional recognition.Then,in order to improve the recognition model's ability to extract high-dimensional emotional information from low-dimensional features,this paper proposes a neural network with CNN and RNN based on attention mechanism(ACRNN).Combined with CNN's feature extraction advantages,RNN's voice sequence task advantages and Attention's local attention advantages,it effectively improves the accuracy of speech emotion recognition.Then,in order to remove the interference of semantic information in speech emotion recognition and improve the accuracy of recognition and the effectiveness of cross language and cross data set,this paper proposes a speech emotion recognitionmethod based on semantic anti-erasure.Under the limitation of small data set,the semantic information is erased from speech features by using large data speech recognition task to assist high recognition effect.Finally,in order to improve the accuracy of emotion recognition,this paper proposes a multi-scale convolution method of multi-model emotion recognition,which combines speech and text emotion information for multi-modal emotion recognition.In addition,in view of the problem of modal absence that is often encountered in the actual use of multi-modal emotion recognition system,this paper proposes an adaptive method of modal absence,which effectively improves the practicability and robustness of multi-modal emotion recognition system.

Keywords/Search Tags:

Speech Emotion Recognition, Attention Mechanism, Semantic Anti-erasure, Multi-modal, Modal Absence

Related items

1	Research On Speech Emotion Recognition Method Based On Multi-feature And Multi-modal Fusion
2	Multi-modal Speech Emotion Recognition Based On The Attention Mechanism
3	Research On Emotion Recognition Of Monomodal Speech And Multimodal Speech Vision Based On Transfer Learning
4	Emotion Recognition Based On Multi-modal Information Fusion
5	Speech Emotion Recognition Model Based On 3D Attention Mechanism And Center Loss
6	Research Of Emotion Recognition Based On Multi-modal Fusion
7	Research On Key Techniques Of Speech Emotion Recognition
8	Research On Multi-modal Emotion Recognition Method Combining Speech And Expression
9	Audio-Visual Multi-Modal Fusion Approach Research And Application
10	Speech And Facial Double Model Emotion Recognition