Font Size: a A A

Research On Speech Emotion Recognition Technology Based On Deep Learning

Posted on:2022-04-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y F MaFull Text:PDF
GTID:2518306539461324Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Speech is the most basic,the most important and the most efficient way of information communication.The speech not only contains the content of the speaker,but also contains the rich emotional information of the speaker.With the advent of the era of big data,speech emotion recognition has become a very active research direction and has potential applications in human-computer interaction systems.As an important part of human-computer interaction system,speech emotion recognition aims to form emotional interaction with machines through direct speech communication.However,due to the complexity and diversity of emotions,speech emotion recognition is a very challenging work.In the research of speech emotion recognition,the main content of the research is to extract discriminant features and build high performance classification model.Based on these two research contents,this thesis proposes a deep convolutional neural network(DCNN)model based on a weighted feature fusion algorithm and an improved genetic simulated annealing algorithm(GSA)to optimize the speech emotion recognition system.Firstly,according to analysis of speech emotion recognition technology limitations and shortcomings,puts forward the research content of this thesis,and introduces the related theory knowledge,mainly including emotion database,speech signal pretreatment technology,methods of extracting feature parameters,and the commonly used classification model,feature dimension reduction strategy to provide technical support for further in-depth research.Secondly,the traditional acoustic feature parameters can only reflect the characteristics of speech emotion signal in time domain or frequency,and can not identify the small gap in emotion.However,in the study of speech emotion recognition,highly correlated features are one of the factors that determine the performance of emotion recognition.Therefore,based on the typical characteristics,mainly including the MEL frequency cepstrum coefficient(MFCC),the logarithmic energy coefficient(LFPC),and the first order and second order differential coefficient,TEO operator and spectra,etc.,this thesis proposes a weighted coefficient of the fusion feature algorithm,make the performance characteristics of complementary,horizontal length and duration of the fusion,Two-dimensional three-channel sonograms with longitudinal length relative to filter banks.The processed sonograms were input into DCNN to further explore deeper features,and the features of deep and shallow layers were fused together to obtain more expressive feature parameters.Softmax classifier was used to achieve sentiment classification.Through experimental simulation,it is found that the proposed weighted fusion feature in EMO-DB database is 9.05% better than the widely used spectrogram feature recognition results,and the average improvement is 23.5% compared with other features.In the IEMOCAP database,the average recognition rate is 10.76% higher than other features.Finally,the traditional DCNN learning method mainly adopts the gradient descent algorithm for learning,and the performance of the algorithm is greatly affected by the initial weight of the convolutional neural network.For DCNN training learning,it is essentially learning to solve the weight.In order to solve this problem,this thesis combined the advantages of genetic algorithm(GA)and simulated annealing algorithm(SA)to optimize it,and proposed a temperature variable coefficient method to improve SA.Experimental results show that the average sentiment recognition rate of the improved algorithm is 6.5% higher than that of the original algorithm in EMO-DB corpus.In the IEMOCAP corpus,the average sentiment recognition rate is increased by 9.89%.
Keywords/Search Tags:Speech emotion recognition, Convolutional neural network, Weighted feature fusion, Genetic algorithm, Simulated annealing algorithm
PDF Full Text Request
Related items