The Research Of Speech Emotion Based On Multi-feature Extraction And Fusion

Posted on:2013-05-07

Degree:Master

Type:Thesis

Country:China

Candidate:B B Tu

Full Text:PDF

GTID:2248330371464734

Subject:Pattern Recognition and Intelligent Systems

Abstract/Summary:

PDF Full Text Request

Speech emotion recognition is the machine recognizing different emotional states through human speech signals.Non-stationary characteristics of speech signal under the different emotions is especially obvious, so feature extraction and selection become extremely important for emotion recognition.This paper extracted emotional voice quality features, prosodic features and spectral characteristics, the improved MFCC can fully reflect the static and dynamic characteristics of speech signal, and further extraction of face image texture features, fuse the probabilities to gain the final decision of anger, disgust, fear, happy, sadness and surprise.The main contents are listed as follows:1. Speech emotion recognition based on improved MFCC with EMD is proposed. Non-stationary characteristics of speech signal under the different emotions is especially obvious,traditional MFCC can only reflect speech static features,while EMD can describe non-stationary characteristics of speech signal clearly.In order to extract the non-stationary features of emotional speech, the improved MFCC steps are proposed such as EMD decomposition into IMFs, Mel filtering, Logarithm and DCT. Recognition results show that recognition rate of the improved MFCC has been significantly improved, and has certain noise immunity.2. Speech emotion recognition based on the fusion of sample entropy and MFCC is proposed. Sample entropy can describe dynamic and fluctuate changes of different emotion signals, which relates to nonlinear dynamical systems is the rate of information production, while MFCC can reflect the static characteristics of the speech signal, so an approach of sample entropy and MFCC fusion for speech emotion recognition is proposed. Sample entropy statistic and MFCC are modeled with SVM respectively to obtain the probabilities of happy, angry, bored and afraid, then the sum and product rules are used to fuse the probabilities to gain the final decision. Simulation results demonstrate that the average recognition rate with the fusion of the sample entropy and MFCC is improved.3. Bimodal emotion recognition of speech signals and facial expression is studied. Voice signals and facial expression changes are synchronized under the different emotions, the recognition algorithm based audio-visual feature fusion is proposed to identify emotional states more accurately. Prosodic features including fundamental frequency, short-term energy and spectrum parameters including LPCC、MFCC were extracted for speech emotional features, and the histogram sequence of local Gabor binary patterns were adopted for facial expression features. Two types of features were modeled with SVM respectively to obtain the probabilities of six emotions, and then fused the probabilities to gain the final decision. Simulation results demonstrate that the average recognition rates of fusing speech signals and facial expression are improved sharply.

Keywords/Search Tags:

speech emotion recognition, facial expression, empirical mode decomposition, Mel-frequency cepstral coefficients, sample entropy

PDF Full Text Request

Related items

1	Bimodal Emotion Recognition Based On Facial Expression And Speech
2	Research On Bimodal Emotion Recognition Based On Facial Expression And Speech Signal
3	Research On Emotion Recognition Based On Speech And Facial Expression
4	Research On Emotion Recognition Based On Speech And Facial Expression
5	The Research On Feature Extraction And Recognition Of Emotional Speech
6	Noise-robust Auditory Feature Extraction And Optimization For Speech Recognition
7	Research On Multi-modal Emotion Recognition Algorithm Based On Speech And Face Expression
8	An Emotion Recognition Method Based On Improved Empirical Mode Decomposition And Characteristics Of Pulse Signal
9	Driver Road Rage Recognition By Combining Facial Expression And Speech
10	Empirical Mode Decomposition Method And Its Research In Speech Recognition Algorithm