Font Size: a A A

Research On Speech Emotion Recognition Based On Deep Learning

Posted on:2019-05-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y F NiuFull Text:PDF
GTID:2428330566978004Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of science and the continuous improvement of technology,we have entered a new era.In this period,computers are gradually replacing human beings to accomplish some challenging tasks.In order to make human and computer more intelligent and more natural interaction,the new human-computer interaction technology is gradually known as the hot spot of research.Emotion analysis technology is a very important part of human-computer interaction technology,and voice contains many emotional information,it is the crucial channel for emotion.So the ultimate purpose of emotional analysis of speech is to recognize human emotions through speech,which will lead the robot to make more rational decisions.And we believe that the research will have an extremely wide range of applications in the future.In recent years,with the continuous development of deep learning technology,it has been widely used,and has achieved very good results.In this paper,the deep learning technology is applied to the study of speech emotion classification,and we address the current problems in the speech emotion recognition,put forward the corresponding improvement methods.The main contribution of this paper are as follows:1.In this paper,a convolution neural network EMNet,which is suitable for speech emotion recognition,is proposed.According to the characteristics of the spectrogram,the structure of CNN is improved,and compared with the classic AlexNet,the performance of EMNet in speech emotion recognition improved by 9.37%,and the parameters needed to be trained is 5.2% of the AlexNet,which makes the EMNet training faster and less memory.2.In this paper,a data processing algorithm based on the principle of retina imaging,which called DPARIP,is proposed.When DPARIP is used to process the data,a lot of training data can be obtained,which effectively alleviates the problem of less training data.Then,DPARIP was combined with AlexNet and EMNet and tested in IEMOCAP database.When Compared with the latest achievements in this field,the classification performance was increased by 22.06% and 23.66% respectively.In conclusion,the experimental results show that the EMNet and the DPARIP are effective.
Keywords/Search Tags:speech emotion recognition, convolution neural networks, retinal imaging principle, deep learning
PDF Full Text Request
Related items