Font Size: a A A

The Research Of Speech Emotion Based On Multi-feature Extraction And Fusion

Posted on:2013-05-07Degree:MasterType:Thesis
Country:ChinaCandidate:B B TuFull Text:PDF
GTID:2248330371464734Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Speech emotion recognition is the machine recognizing different emotional states through human speech signals.Non-stationary characteristics of speech signal under the different emotions is especially obvious, so feature extraction and selection become extremely important for emotion recognition.This paper extracted emotional voice quality features, prosodic features and spectral characteristics, the improved MFCC can fully reflect the static and dynamic characteristics of speech signal, and further extraction of face image texture features, fuse the probabilities to gain the final decision of anger, disgust, fear, happy, sadness and surprise.The main contents are listed as follows:1. Speech emotion recognition based on improved MFCC with EMD is proposed. Non-stationary characteristics of speech signal under the different emotions is especially obvious,traditional MFCC can only reflect speech static features,while EMD can describe non-stationary characteristics of speech signal clearly.In order to extract the non-stationary features of emotional speech, the improved MFCC steps are proposed such as EMD decomposition into IMFs, Mel filtering, Logarithm and DCT. Recognition results show that recognition rate of the improved MFCC has been significantly improved, and has certain noise immunity.2. Speech emotion recognition based on the fusion of sample entropy and MFCC is proposed. Sample entropy can describe dynamic and fluctuate changes of different emotion signals, which relates to nonlinear dynamical systems is the rate of information production, while MFCC can reflect the static characteristics of the speech signal, so an approach of sample entropy and MFCC fusion for speech emotion recognition is proposed. Sample entropy statistic and MFCC are modeled with SVM respectively to obtain the probabilities of happy, angry, bored and afraid, then the sum and product rules are used to fuse the probabilities to gain the final decision. Simulation results demonstrate that the average recognition rate with the fusion of the sample entropy and MFCC is improved.3. Bimodal emotion recognition of speech signals and facial expression is studied. Voice signals and facial expression changes are synchronized under the different emotions, the recognition algorithm based audio-visual feature fusion is proposed to identify emotional states more accurately. Prosodic features including fundamental frequency, short-term energy and spectrum parameters including LPCC、MFCC were extracted for speech emotional features, and the histogram sequence of local Gabor binary patterns were adopted for facial expression features. Two types of features were modeled with SVM respectively to obtain the probabilities of six emotions, and then fused the probabilities to gain the final decision. Simulation results demonstrate that the average recognition rates of fusing speech signals and facial expression are improved sharply.
Keywords/Search Tags:speech emotion recognition, facial expression, empirical mode decomposition, Mel-frequency cepstral coefficients, sample entropy
PDF Full Text Request
Related items