Research On Speech Emotion Recognition Algorithm Based On Deep Learning

Posted on:2020-01-21

Degree:Master

Type:Thesis

Country:China

Candidate:Z L Liang

Full Text:PDF

GTID:2428330599962095

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

Speech Emotion Recognition(SER)is a research hotspot in the field of artificial intelligence in recent years.It has broad application prospects in emotional robots,online education,customer service centers,assisted driving,and criminal investigation.At present,the research on speech emotion recognition has made many progress,but the establishment of a reasonable and efficient speech emotion recognition network model is still one of the main problems currently facing.Therefore,based on the analysis of the current mainstream Convolutional Recurrent Neural Network(CRNN)recognition model,this paper carries out three aspects: the processing of unequal size samples,category imbalance samples and unbalanced samples of emotional information frames.Research improvements to improve the recognition performance of the model.The main research work is as follows:Firstly,for the unequal length samples,a variable length input strategy is adopted to solve the problem of emotional type confusion and discontinuous timing information caused by long-term sample segmentation in the fixed-length input model,which effectively improves the recognition performance of the model.In the IEMOCAP corpus(neutral,happy,sad,angry)four types of emotion recognition experiments,66.59% UAR(Unweighted Average Recall)and 69.33% WAR(Weighted Average Recall)recognition performance,and fixed-length input model Compared with 8.61% and 5.86% respectively.Secondly,for the class imbalance samples,the focus loss function is used instead of the cross entropy inverse weight method to train the model,which improves the model's ability to mine difficult samples and effectively enhances the model's ability to learn from unbalanced samples.The experimental performance of 68.66% UAR and 69.67% WAR was improved,which was 2.06% and 0.34% higher than the "baseline" model.Finally,for the unbalanced sample distribution of emotional information frames,the Connectionist Temporal Classification(CTC)method is introduced in the �baseline� model,and the emotional tags are aligned to the emotional frames by the CTC method,so that the model only focuses on learning emotions.Frames effectively improve model recognition performance.The experiment achieved 69.75% UAR and 70.42% WAR recognition performance,compared with the "baseline" model,respectively increased by 1.09% and 0.75%.Considering the limitation of the CTC method to the same degree of learning of emotional frames,the Attention Mechanism(AM)is introduced in the �baseline� model,which assigns different attention weights to speech frames according to the content of emotional information.Carry out different levels of learning.The experiment achieved 71.77% UAR and 71.60% WAR recognition performance,which is better than the above CTC model.

Keywords/Search Tags:

Speech Emotion Recognition, Convolutional Recurrent Neural Network, Focal Loss, Connectionist Temporal Classification, Attention Mechanism

PDF Full Text Request

Related items

1	Study On Attention Based Speech Emotion Recognition
2	Research On Speech Emotion Recognition Based On Convolutional Recurrent Neural Network
3	Research On End-to-End Speech Recognition Based On GRU And Self-Attention Mechanism
4	Research And Implementation Of Speech Recognition Algorithm Based On Recurrent Neural Network
5	Research On Feature Fusion Method Of Speech Emotion Recognition Based On Deep Learning
6	Speech Emotion Recognition Based On Deep Learning
7	Research On CTC-based And Attention-based End-to-end Speech Recognition
8	Research Of Speech Emotion Recognition Method Based On Convolutional Recurrent Neural Networks
9	Research On End-to-End Speech Recognition Method Based On Self-Attention Mechanism
10	Research On End-to-end Speech Recognition Based On Convolutional Neural Networks