Font Size: a A A

Speech Emotion Recognition Based On Deep Learning Technology

Posted on:2021-02-15Degree:MasterType:Thesis
Country:ChinaCandidate:J L ShangFull Text:PDF
GTID:2428330605482491Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As the most convenient medium in human emotion communication,speech signals contain speaker's semantic,culture information and speaker's emotion.Speech emotion recognition(SER)is a technology that can automatically recognize and acquire emotions by the given speech signals.SER is widely used in medical treatment,education,industry and other fields which plays an irreplaceable role in humanmachine interactions.SER is a difficult task and challenging research.As the research of traditional features and the rapid development of deep learning technology and its application for SER,researchers have tried various methods and achieved good research progress.Due to the problem of small speech datasets and the imbalance of data categories,and the traditional acoustic feature combine with deep learning may lose important information.There exists the problem of incomplete emotional information in its features,the traditional acoustic features are also called Low-Level Descriptors(LLDs).For the spectrogram methods,there exists the problem of irrelevant information's influence,SER still faces great challenges.Based on above research difficulties and emphases,this thesis includes following parts:1.Aiming at the problems of traditional acoustic feature methods,this thesis proposed a new method based on LLDs features and deep learning technology for SER.Firstly,we selected the most important features which are related with emotion.And we combined with the high-level statistical features(HSF)to reduce the dimension of emotional features without affecting the recognition effect.Then,Convolutional Neural Network(CNN)was used to further extract higher-level features,which can analyze the correlation effectively between features and retain useful information.Finally,Extreme Learning Machine(ELM)model was used for final recognition.ELM model can solve the problem of small data samples and improve the recognition accuracy.In this thesis,we carried out comparative experiments and proved the effectiveness of our proposed method based on CNN and ELM model.And we analyzed the effectiveness of combining LLDs and HSF features,and we compared the performance of support vector machine(SVM)with ELM model.2.In view of the problems in the methods of spectrograms,we proposed a new method based on spectrograms and deep learning technology for SER.Firstly,we proposed the improved methods of speech data preprocessing to reduce the influence of small speech datasets and data imbalance.Then,we researched the popular methods of Mel-spectrograms.In this thesis,we extracted the three-channel(3-D)Melspectrograms as the input of Deep Convolutional Neural Network(DCNN)model.The three-channel Mel-spectrograms can represent the speech reasonably and retain the emotional features.Next,we adopted the pre-trained DCNN model to extract framelevel emotional features which can solve the problem of small sample data and poor training effect of network.Then,we input the features into Bi-Long Short-Term Memory(BLSTM)model to further extract high-level emotional information on the time dimension.In order to solve the problem of the influence of emotion irrelevant features,the attention mechanism model was adopted to focus on the related emotional features,which can reduce the influence of irrelevant frames.Finally,we used Deep Neural Network(DNN)model for final SER.All in all,this thesis proposed two speech emotion recognition methods based on traditional LLDs features and spectrogram representation features.Experiments on Berlin speech emotion database(EMO-DB)and IEMOCAP emotion database obtained good results and promising performance,the highest unweighted accuracy was 87.86% and 68.50% respectively,it demonstrated the effect of our proposed method for SER.
Keywords/Search Tags:Speech emotion recognition, deep learning, deep convolutional neural network (DCNN), Mel-spectrograms, Bi-Long Short-Term Memory (BLSTM), Attention mechanism
PDF Full Text Request
Related items