Analysis Of Emotional Speech And The Recognition Of Affectively Stressed Speakers

Posted on:2007-03-31

Degree:Master

Type:Thesis

Country:China

Candidate:T Wu

Full Text:PDF

GTID:2178360182466662

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

Speaker recognition (SR), which identifies or verifies people by their voice, is regarded as the most natural and convenient one among the methods of biometrics. Current speaker verification and identification systems are limited by the effect on speech of transient state changes to speakers, such as cognitive and physiological stress, emotional state and speaker attitude. The variability of intra-speaker in these situations can cause unacceptably high error rates. In this thesis, we first made a study of emotional speech features, the current advancements of it, and the recent methods of improving the recognition of affectively stressed speakers. Then based on it, we proposed our methods and algorithms to deal with the emotional speech in SR system. The main contribution of our work are as the followings:1. A large emotional speech database MASC@CCNT (Mandarin Affective Speech Corpus, at CCNT Lab) is designed and created. This database is constructed for prosodic and linguistic investigation of emotion expression in Mandarin. It can also be used for recognition of affectively stressed speakers.2. A series of typical features are employed to investigate the characteristic of emotional speech. The study focuses on pitch structure, duration and the omissions of segments, vowel formant analysis, energy analysis on two emotional speech database, MASC and EPST. MASC was recorded in Mandarin by 68 native speakers. EPST was recorded in English by 8 actors and actresses. The results show a comparison of emotion expression between the eastern and the western.3. Two procedures that only need a small quantity of affective training data are applied to ASR task, which is very practical in real-world situations. The approach includes classifying the emotional states by acoustical features (Mean Pitch, Pitch Range, Pitch Variance, Pitch Skewness and Pitch Expansion) and generating emotion-added model based on the emotion grouping. Experimental works are performed on EPST and show significant improvement.4. A model of pitch-dependent spectral feature compensation against emotional speech variability is proposed. We have pointed out that the unaffected cepstral features would behavior more discriminative than the traditional ones. A preliminary work is elicited to use both long-term and short-term features to get the compensated ones. Novel strategy for selecting the compensating parameters is designed for GMM (Gaussian Mixture Model) training. Experiments are performed on both EPST and MASC database.This work is supported by National Natural Science Foundation of P. R. China (60273059), National Science Fund for Distinguished Young Scholars (60525202), Program for New Century Excellent Talents in University (NCET-04-0545) and Key Program of Natural Science Foundation of China (60533040).

Keywords/Search Tags:

Speaker Recognition, Emotional Speech, Emotional Features, Frequency Fundamental (F0)

PDF Full Text Request

Related items

1	Research On Performance Comparison Between Human And Machine For Emotional Speaker Recognition
2	Research On Emotional Speaker Recognition And Its Solutions
3	Research On Pitch Mismatch And Its Compensation Methods In Emotional Speaker Recognition
4	Research On Speech And Emotional Recognition Of Specific Speakers
5	Research And Implementation Of Gaussian Mixture Model-based Speech Emotion Recognition
6	Research On Varying And Clustering Based Emotion Robust Speaker Recognition
7	Speech Emotional Recognition Research Based On Features Extraction And Multi-modal Combination
8	Researcb Of Emotional Speech Recognition And Synthesis
9	Study On Speech Emotion Recognition And Its Application
10	Speech Emotional Recognition Research Fuses Facial Expression