Font Size: a A A

The Research Of Speech Emotion Recognition Method

Posted on:2017-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:H H ShiFull Text:PDF
GTID:2348330512450929Subject:Physical Electronics
Abstract/Summary:PDF Full Text Request
Speech is one of the most important means for people to communicate,and is the fastest and most direct way for people to transmit information.Speech signals not only convey semantic information,but also deliver the speaker's emotional state at the same time,and it is hoped that the computer can have such kind of emotional communication ability exclusively belonging to human beings.How to make the machine rapidly and accurately identify emotions and understand emotions has become the keys of artificial intelligence and speech recognition technology research,and speech emotion recognition has become one of the topics of researchers in focus.Speech emotion recognition consists three part of key technologies,which are emotion features extraction,emotion features dimension descend and speech emotion recognition.In this paper analyzing the related research around these three aspects,the main research work is as follows:1)Emotion speech signal preprocessing.Using the CASIA Chinese emotional corpus recorded by the automation research of Chinese academy,via pre-emphasis,voice/unvoiced determination and frame window to do the preprocessing.Pre-emphasis is realized by using the first-order FIR high-pass filter.Through studying the existed voice/unvoiced determination algorithm,a voice/unvoiced determination algorithm,W-SRH,which combines the wavelet transform analysis and the SRH,was designed,and could effectively determinate the voice/unvoiced segments.2)Speech emotional features extraction.Pronunciation rate,short-time energy and Mel Frequency Cepstrum Coefficients were calculated,and pitch frequency was obtained by using the summation of the residual harmonics algorithm and also using the LPC method to extract the formant parameters,and MATLAB was applied to perform simulations on the computer.The current methods to extract glottal waveform were investigated and analyzed,the combined summation of the residual harmonics and the pitch synchronization iterative adaptive inverse filtering method(SRH-PSIAIF)approach was utilized to extract the glottal waveform.Then the spectrum characteristics of the glottal waveform was investigated,which are the parabolic spectral parameter(PSP)and harmonic richness factor(HRF).3)Speech emotion features dimension descend.Take the pronunciation rate and the maximum,minimum,range and average of pitch frequency,first three formant parameters,12 order Mel frequency cepstrum coefficients to construct as a feature vector,using principal component analysis(PCA)method to reduce the vector dimension,subsequently the correlation and redundancy between features were reduced,and the variance significant characteristics were selected for speech emotion recognition.4)Speech emotion recognition.By using the BP neural network and stacked autoencoder of deep learning to recognize the reduced feature vector dimension,the accuracy rate can reach 81.67%and 89.177%.At the same time,the glottal waveform spectrum characteristics PSP and HRF was combined for emotion recognition,and the accuracy rate can reach 84.17%and 91.25%.The result proved that glottal waveform spectrum characteristics parabolic spectral parameter and harmonic richness factor could be used in speech emotion recognition,and the stacked autoencoder algorithm applied in speech emotion recognition is more effective than the BP neural network algorithm.The research results of this paper can be applied to the artificial intelligence and human-computer interaction,emotion perceive and emotion robot,etc.
Keywords/Search Tags:speech emotion recognition, voice/unvoiced determination, glottal waveform, BP neural network, the stacked autoencoder
PDF Full Text Request
Related items