Font Size: a A A

Speech Emotion Recognition Based On Deep Learning Method

Posted on:2021-05-05Degree:MasterType:Thesis
Country:ChinaCandidate:B Y WangFull Text:PDF
GTID:2428330611499282Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
With the rapid development of artificial intelligence,speech is widely concerned by researchers,as one of the most direct ways of human-computer interaction.To realize the harmonious communication between human and machine,the emotion in the speech is a crucial factor.Nowadays,applications including remote interactive education,humanized machine customer service,psychological guidance robot continue to promote the development of speech emotion recognition.But currently speech emotion recognition still faces many problems,they are mainly how to select and construct speech features that are effective for emotion classification,and how to build a high-performance recognition model.This article first introduces a mixture model of gradient boosting tree algorithm and ridge regression,then we built up a set of deep learning neural network models for experiments.In summary,the main work of the article is as follows:First we build a mixture model based on Light GBM,and utilize open SMILE to extract 8 feature sets of Inter Speech International Speech Emotional Challenge,a Light GBM model is trained on each feature set,and mixed with the ridge regression model,so that the model can learn all of the information from each feature set by the single models,and the ridge regression can avoid overfitting,finally we good obtain a good performance on recognition.We build a mixed deep learning model of CNN,LSTM and attention mechanism,pass the features like chromagram,energy normalized chromagram,melspectrogram and mfcc into CNN as its input,which can learn from the spectrogram and other features on each frame,the output time series of high-level representations are then analyzed by LSTM,and a layer of attention is inserted to learn all the hidden state of LSTM and focus on the more import part for emotion recognition,the ability of speech emotion recognition is promoted obviously comparing with three control experiments.And we substitute a bidirectional LSTM for the LSTM layer in the former networks,the performance is then slightly improved.
Keywords/Search Tags:speech emotion recognition, CNN, LSTM, attention mechanism
PDF Full Text Request
Related items