Research On Key Techniques Of Speech Emotion Recognition

Posted on:2008-03-11

Degree:Doctor

Type:Dissertation

Country:China

Candidate:M Y You

Full Text:PDF

GTID:1118360242472935

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Speech emotion recognition is a facet of artificial intelligence.Its applications can be foreseen in the broad area of human-machine interaction,such as automatic media streams segmentation and reliable surveillance detection,etc.Speech emotion recognition includes speech signal pre-processing,acoustic features extraction, dimensionality reduction and model based emotion recognition.The thesis focuses on some crucial topics of speech emotion recognition.It proposes the visualization method for emotional speech corpus,semi-supervised speech emotion recognition,a nonlinear manifold learning algorithm ELE and emotion recognition based on an emotion interaction matrix.The collection,annotation and visualization of emotional speech corpus are discussed.A Chinese affective database(CHAD)is established,which includes emotional materials from different sources.These data sources are compared based on listening annotation.Multi-dimensional acoustic features of emotional speech are mapped into a two-dimensional plane MASE MAP,with the Sammon's nonlinear mapping method.By analyzing the MASE MAP,useful corpus information such as emotion constituents,interrelations among utterances,degree of overlapping can be obtained.An enhanced co-training algorithm is proposed to build a semi-supervised learning system.It uses unlabeled examples to augment a much smaller set of labeled examples. Experimental results demonstrate that the proposed system makes 7.4%-9.0% improvement,compared with the conventional co-training algorithm.Moreover,the enhanced co-training algorithm reduces the classification noise which is brought by error labeling unlabeled utterances.Different kinds of approaches in dimensionality reduction are researched.Based on the detailed comparisons among linear methods,a new hierarchical framework is proposed for speech emotion recognition.An appropriate dimensionality reduction method is employed for every emotion in the new framework.It achieves 78.7%-83.4% recognition accuracy in speaker-independent experiment.A nonlinear manifold learning algorithm ELE is proposed.Based on geodesic distance estimation,high-dimensional acoustic features are embedded into a six-dimensional space.In this space,speech data with the same emotion are clustered to one plane,which is benefit to emotion classification.Experimental results demonstrate that the proposed system makes 9%-26% relative improvement in speaker-independent emotion recognition and 5%-20% improvement in speaker-dependent.LDA-L1-Rank is presented.Detailed comparisons of PCA,LDA,PCA-L1-Rank and LDA-L1-Rank are performed in speech emotion recognition.Besides,a hybrid system based on all-class Feature Selection and pairwise-class Feature Selection is forwarded.The system collects the features both good at all class categoration and each pair of classes' separation.The proposed approach achieves 3.2%-8.4%relative improvement on the average F1-measure in speaker-independent emotion recognition.Emotion recognition with other information sources is investigated.A novel conversation database in Chinese is created and an emotion interaction matrix is proposed to embody the discourse information in conversation.The recognition method with discourse information simulates the human emotion perception and achieves more robust performance.Facial expression is combined into emotion recognition system.A THMM and a segmental k-means training algorithm are proposed.A tripled Viterbi optimal path searching algorithm is also introduced to make the maximum likelihood decision.Moreover,a weight parameter is employed to balance the contribution of audio and visual.The whole approach gives 91.9%average accuracy and better robustness. Investigating the emotion recognition from noisy speech is motivated by the practicality of technique.ELE method is used to compress the acoustic features of noisy speech.An improvement of approximately 10%shows ELE's ability to detect the intrinsic geometry of emotional speech,even corrupted by noise.

Keywords/Search Tags:

Speech Emotion Recognition, Emotional Corpus Visualization, Semi-supervised Learning, ELE nonlinear manifold algorithm, Feature Selection, Discourse Emotion Interaction, Multi-Modal Emotion Recognition, Emotion Recognition from Noisy Speech

PDF Full Text Request

Related items

1	Research On Emotion Recognition Of Monomodal Speech And Multimodal Speech Vision Based On Transfer Learning
2	Research On Key Technologies Of Speech Emotion Recognition
3	Emotion Recognition Based On Multi-modal Information Fusion
4	Research On Feature Fusion Method Of Speech Emotion Recognition Based On Deep Learning
5	Research On Speech Emotion Recognition Method Based On Multi-feature And Multi-modal Fusion
6	Research On Key Issues Of Mandarin Speech Emotion Recognition
7	Neural Network-based Chinese Speech Emotion Recognition
8	Research On Multi-modal Emotion Recognition Method Combining Speech And Expression
9	Speech Emotion Recognition Based On A Semi-supervised Learning Research
10	Speech Emotion Recognition Research Based On Feature Selection