Font Size: a A A

Research On Key Techniques Of Speech Emotion Recognition

Posted on:2008-03-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:M Y YouFull Text:PDF
GTID:1118360242472935Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Speech emotion recognition is a facet of artificial intelligence.Its applications can be foreseen in the broad area of human-machine interaction,such as automatic media streams segmentation and reliable surveillance detection,etc.Speech emotion recognition includes speech signal pre-processing,acoustic features extraction, dimensionality reduction and model based emotion recognition.The thesis focuses on some crucial topics of speech emotion recognition.It proposes the visualization method for emotional speech corpus,semi-supervised speech emotion recognition,a nonlinear manifold learning algorithm ELE and emotion recognition based on an emotion interaction matrix.The collection,annotation and visualization of emotional speech corpus are discussed.A Chinese affective database(CHAD)is established,which includes emotional materials from different sources.These data sources are compared based on listening annotation.Multi-dimensional acoustic features of emotional speech are mapped into a two-dimensional plane MASE MAP,with the Sammon's nonlinear mapping method.By analyzing the MASE MAP,useful corpus information such as emotion constituents,interrelations among utterances,degree of overlapping can be obtained.An enhanced co-training algorithm is proposed to build a semi-supervised learning system.It uses unlabeled examples to augment a much smaller set of labeled examples. Experimental results demonstrate that the proposed system makes 7.4%-9.0% improvement,compared with the conventional co-training algorithm.Moreover,the enhanced co-training algorithm reduces the classification noise which is brought by error labeling unlabeled utterances.Different kinds of approaches in dimensionality reduction are researched.Based on the detailed comparisons among linear methods,a new hierarchical framework is proposed for speech emotion recognition.An appropriate dimensionality reduction method is employed for every emotion in the new framework.It achieves 78.7%-83.4% recognition accuracy in speaker-independent experiment.A nonlinear manifold learning algorithm ELE is proposed.Based on geodesic distance estimation,high-dimensional acoustic features are embedded into a six-dimensional space.In this space,speech data with the same emotion are clustered to one plane,which is benefit to emotion classification.Experimental results demonstrate that the proposed system makes 9%-26% relative improvement in speaker-independent emotion recognition and 5%-20% improvement in speaker-dependent.LDA-L1-Rank is presented.Detailed comparisons of PCA,LDA,PCA-L1-Rank and LDA-L1-Rank are performed in speech emotion recognition.Besides,a hybrid system based on all-class Feature Selection and pairwise-class Feature Selection is forwarded.The system collects the features both good at all class categoration and each pair of classes' separation.The proposed approach achieves 3.2%-8.4%relative improvement on the average F1-measure in speaker-independent emotion recognition.Emotion recognition with other information sources is investigated.A novel conversation database in Chinese is created and an emotion interaction matrix is proposed to embody the discourse information in conversation.The recognition method with discourse information simulates the human emotion perception and achieves more robust performance.Facial expression is combined into emotion recognition system.A THMM and a segmental k-means training algorithm are proposed.A tripled Viterbi optimal path searching algorithm is also introduced to make the maximum likelihood decision.Moreover,a weight parameter is employed to balance the contribution of audio and visual.The whole approach gives 91.9%average accuracy and better robustness. Investigating the emotion recognition from noisy speech is motivated by the practicality of technique.ELE method is used to compress the acoustic features of noisy speech.An improvement of approximately 10%shows ELE's ability to detect the intrinsic geometry of emotional speech,even corrupted by noise.
Keywords/Search Tags:Speech Emotion Recognition, Emotional Corpus Visualization, Semi-supervised Learning, ELE nonlinear manifold algorithm, Feature Selection, Discourse Emotion Interaction, Multi-Modal Emotion Recognition, Emotion Recognition from Noisy Speech
PDF Full Text Request
Related items