Font Size: a A A

Speech Emotion Recognition Based On Deep Learning

Posted on:2019-02-23Degree:MasterType:Thesis
Country:ChinaCandidate:B C JinFull Text:PDF
GTID:2348330545984502Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Since ancient times,the emotion is one of the important ways in human communication,with the development of artificial intelligence technology,human-computer interaction is not only limited to simple speech and text.The demand that computer understanding the affection is increasing rapidly,so the speech emotion recognition is particularly important in human-computer interaction.In essence,speech emotion recognition is actually a process of information processing,that is,extracting and identifying affective information in speech signals,and its application fields are very extensive.For example,human-computer interaction including dialogue robots and public safety monitoring,customer service attitude monitoring,etc.Speech emotion recognition has a wide range of applications at the same time has long been concerned by the academic community,this year with the development of psychology,physiology,neuroscience and artificial intelligence technology,the technology of speech emotion recognition has made great strides compared to the late 20th century,but due to factors such as feature extraction,the current recognition effect still is far away from the practical application.In this paper,the experiment is carried out from the perspective of feature extraction,and aiming at a series of difficult problems in speech emotion recognition,the corresponding solutions are proposed.The main contents include:1)Speech emotion recognition based on LSTM:Modeling the speech emotion with traditional features MFCC and traditional acoustic models GMM while modeling short-term features MFCC using the LSTM and the traditional model is compared.2)Speech emotion recognition based on hyper-prosodic features:According to the results drew before,we propose a viewpoint that the speech emotion is well performed by the long-time changes of prosody.Based on this,a feature extraction method,extraction of hyper-prosodic features(EHPF),is proposed.The raw signal was processed by down-sampling through the contour established by prosodic features such as fundamental frequency and energy.From the feature contour,a lot of statistical features can be extracted,and the emotion information can be contained in the features set as much as possible.Then we choose the highly related features by removing the redundant,which is called the feature selection.In the paper,contrast experiments are conducted on different public databases by using various classifiers,such as SVM,GBDT,random forest,DNN and others.The results we have obtained in the experiments are closed to,even beyond the state-of-the-art on multiple databases and demonstrate that this method achieves better performance and tended to validate the conclusion,which is put forward in this paper.3)Speech emotion recognition based on EHPF and spectrogram-CNN:We validate the efficiency of the long-term features and rethink the significance of the features in the frequency domain.By using the spectrum diagrams in the frequency domain and CNN technique,we extracted global spectral features,conducted experiments and compared the results with short-term MFCCs.At the same time,the EHPF features in chapter 4 are introduced.The time-domain and frequency-domain features,short-term features and long-time features are combined to obtain the experiment results beyond all the previous systems.In this paper,we compare the long-term features with short-term features and propose the feature extraction algorithm named EHPF.Meanwhile,the validity of this algorithm is verified.Finally,using deep learning technique,we combines time-domain and frequency domain,long-short term information and the validity of the method is verified.
Keywords/Search Tags:speech emotion recognition, deep learning, CNN, LSTM, EHPF
PDF Full Text Request
Related items