Research On Speech Emotion Recognition Technology Based On Deep Learning

Posted on:2022-04-14

Degree:Master

Type:Thesis

Country:China

Candidate:Y F Ma

Full Text:PDF

GTID:2518306539461324

Subject:Electronics and Communications Engineering

Abstract/Summary:

PDF Full Text Request

Speech is the most basic,the most important and the most efficient way of information communication.The speech not only contains the content of the speaker,but also contains the rich emotional information of the speaker.With the advent of the era of big data,speech emotion recognition has become a very active research direction and has potential applications in human-computer interaction systems.As an important part of human-computer interaction system,speech emotion recognition aims to form emotional interaction with machines through direct speech communication.However,due to the complexity and diversity of emotions,speech emotion recognition is a very challenging work.In the research of speech emotion recognition,the main content of the research is to extract discriminant features and build high performance classification model.Based on these two research contents,this thesis proposes a deep convolutional neural network(DCNN)model based on a weighted feature fusion algorithm and an improved genetic simulated annealing algorithm(GSA)to optimize the speech emotion recognition system.Firstly,according to analysis of speech emotion recognition technology limitations and shortcomings,puts forward the research content of this thesis,and introduces the related theory knowledge,mainly including emotion database,speech signal pretreatment technology,methods of extracting feature parameters,and the commonly used classification model,feature dimension reduction strategy to provide technical support for further in-depth research.Secondly,the traditional acoustic feature parameters can only reflect the characteristics of speech emotion signal in time domain or frequency,and can not identify the small gap in emotion.However,in the study of speech emotion recognition,highly correlated features are one of the factors that determine the performance of emotion recognition.Therefore,based on the typical characteristics,mainly including the MEL frequency cepstrum coefficient(MFCC),the logarithmic energy coefficient(LFPC),and the first order and second order differential coefficient,TEO operator and spectra,etc.,this thesis proposes a weighted coefficient of the fusion feature algorithm,make the performance characteristics of complementary,horizontal length and duration of the fusion,Two-dimensional three-channel sonograms with longitudinal length relative to filter banks.The processed sonograms were input into DCNN to further explore deeper features,and the features of deep and shallow layers were fused together to obtain more expressive feature parameters.Softmax classifier was used to achieve sentiment classification.Through experimental simulation,it is found that the proposed weighted fusion feature in EMO-DB database is 9.05% better than the widely used spectrogram feature recognition results,and the average improvement is 23.5% compared with other features.In the IEMOCAP database,the average recognition rate is 10.76% higher than other features.Finally,the traditional DCNN learning method mainly adopts the gradient descent algorithm for learning,and the performance of the algorithm is greatly affected by the initial weight of the convolutional neural network.For DCNN training learning,it is essentially learning to solve the weight.In order to solve this problem,this thesis combined the advantages of genetic algorithm(GA)and simulated annealing algorithm(SA)to optimize it,and proposed a temperature variable coefficient method to improve SA.Experimental results show that the average sentiment recognition rate of the improved algorithm is 6.5% higher than that of the original algorithm in EMO-DB corpus.In the IEMOCAP corpus,the average sentiment recognition rate is increased by 9.89%.

Keywords/Search Tags:

Speech emotion recognition, Convolutional neural network, Weighted feature fusion, Genetic algorithm, Simulated annealing algorithm

PDF Full Text Request

Related items

1	Research And Application Of Speech Emotion Recognition Algorithm Based On Deep Learning
2	Improved Genetic Simulated Annealing Algorithm Based On BP Neural Networks And Application In Recognition Of GIS Partial Discharge
3	Research On Speech Emotion Recognition Based On Spatiotemporal Feature Fusion
4	Research On Feature Fusion Method Of Speech Emotion Recognition Based On Deep Learning
5	Bimodal Emotion Recognition Based On Deep Learning
6	Dual Fusion Speech Emotion Recognition Based On Deep Learning
7	Research On Speech Emotion Recognition Based On Multi Features Fusion
8	Emotion Speech Recognition Based On Artificial Neural Network
9	Research On Multi-modal Emotion Recognition Algorithm Based On Speech And Face Expression
10	Research On Speech Emotion Recognition Algorithm Based On Neural Network