Font Size: a A A

Research On Emotion Recognition Based On Speech And Facial Expression

Posted on:2013-05-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:S Q ZhangFull Text:PDF
GTID:1228330395974801Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Emotion recognition is an important branch of affective computing, and currentlyan active research topic in the fields such as signal processing, pattern recognition,artificial intelligence, human computer interaction, etc. Since emotion recognition is acomplex research subject with multi-disciplines, there exist many problems to be solved.Especially, the problems about feature extraction, feature reduction, recognitionmethods and multi-modality emotional information fusion remain to be deeply studied.Emotional speech and facial expression are two of the most important ways tohuman emotion expression. This thesis aims to explore systematically the keytechniques of emotion recognition based on emotional speech and facial expression, andproposes some improved algorithms for emotion recognition. The main contents of thisthesis are:1. The research history and current status about speech emotion recognition andfacial expression recognition is summarized. An overview of their key and difficultpoints, such as emotional databases, emotional feature extraction and dimensionalityreduction, as well as the emotion classification algorithms, is also presented.2. An improved supervised locally linear embedding (Improved-SLLE) algorithmis proposed for the nonlinear dimensionality reduction of emotional speech features. Totackle the existing drawbacks of SLLE, this thesis develops an improved version ofSLLE, which enhances the discriminating power of SLLE’s low-dimensional embeddeddata and possesses the optimal generalization ability. The proposed algorithm is used toconduct nonlinear dimensionality reduction for emotional speech features includingprosody features and voice quality features, and improve speech emotion recognitionperformance in the low-dimensional embedded space.3. A kernel discriminant locally linear embedding (KDLLE) is proposed for thekernel-based nonlinear dimensionality reduction of emotional speech features. Tointegrate the kernel method and locally linear embedding (LLE), this thesis designs akernel discriminant distance and realizes minimizing the reconstruction error in areproducing kernel Hilbert space (RKHS). When the proposed algorithm is used toconduct nonlinear dimensionality reduction for emotional speech features, it not onlyobtains the better low-dimensional visualization results than LLE, but also gives thepromising performance on speech emotion recognition tasks. 4. A method of speech emotion estimation based on a three dimensional continuousemotion space model called “activation-valence-dominance (AVD)” is proposed tostudy the tracking technique of the continuous dynamic changes of emotion expressionin speech. According to the theory of emotional dimension representation, emotion canbe defined as the coordinate points in the AVD model. Each coordinate point can berepresented by one discrete emotion category. The continuous changes of the values ofcoordinate points in the AVD model is predicted and estimated by using the method ofregression analysis. Consequently, the problem of speech emotion recognition for thedescrete emotion is changed into the regression analysis problem of speech emotionestimation for the continuous changes of emotion, so as to realize tracking thecontinuous dynamic changes of emotion expression in speech.5. A new method of facial expression recognition based on sparse representation isproposed to provide a robust facial expression recognition technique via compressivesensing. The sparse representation of recognized testing facial expression images withcorruption or occlusion is firstly sought. And then the sparsest solution is obtained bythe compressive sensing theory. Finally, facial expression classification is conducted byusing the sparsest solution. After extracting three kinds of facial features inlucding theraw pixels, local binary patterns (LBP) and Gabor wavelets representation, the proposedmethod can be used to realize robust facial expression recognition. Experimental resultsshow that the proposed method obtains the promising performance on facial expressionrecognition tasks, and exhibits good robustness.6. The mechanism of multi-modal emotion recognition integrating facialexpression and speech is studied. At first, the corresponding features to emotionalspeech and facial expression are extracted. And then two strategies of multi-modalityinformation fusion, i.e., at the feature-level and at the decision-level, are used to conductmulti-modal emotion recognition. Experimental results indicate that the presentedmethod of multi-modal emotion recognition gives better performance than themono-modality. The product rule at the decision-level fusion yields the bestperformance on multi-modal emotion recognition tasks.
Keywords/Search Tags:emotion recognition, emotional speech, facial expression, nonlineardimensionality reduction, regression analysis, sparse representation, multi-modalityinformation fusion
PDF Full Text Request
Related items