Font Size: a A A

Analysis Of Emotional Speech And The Recognition Of Affectively Stressed Speakers

Posted on:2007-03-31Degree:MasterType:Thesis
Country:ChinaCandidate:T WuFull Text:PDF
GTID:2178360182466662Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Speaker recognition (SR), which identifies or verifies people by their voice, is regarded as the most natural and convenient one among the methods of biometrics. Current speaker verification and identification systems are limited by the effect on speech of transient state changes to speakers, such as cognitive and physiological stress, emotional state and speaker attitude. The variability of intra-speaker in these situations can cause unacceptably high error rates. In this thesis, we first made a study of emotional speech features, the current advancements of it, and the recent methods of improving the recognition of affectively stressed speakers. Then based on it, we proposed our methods and algorithms to deal with the emotional speech in SR system. The main contribution of our work are as the followings:1. A large emotional speech database MASC@CCNT (Mandarin Affective Speech Corpus, at CCNT Lab) is designed and created. This database is constructed for prosodic and linguistic investigation of emotion expression in Mandarin. It can also be used for recognition of affectively stressed speakers.2. A series of typical features are employed to investigate the characteristic of emotional speech. The study focuses on pitch structure, duration and the omissions of segments, vowel formant analysis, energy analysis on two emotional speech database, MASC and EPST. MASC was recorded in Mandarin by 68 native speakers. EPST was recorded in English by 8 actors and actresses. The results show a comparison of emotion expression between the eastern and the western.3. Two procedures that only need a small quantity of affective training data are applied to ASR task, which is very practical in real-world situations. The approach includes classifying the emotional states by acoustical features (Mean Pitch, Pitch Range, Pitch Variance, Pitch Skewness and Pitch Expansion) and generating emotion-added model based on the emotion grouping. Experimental works are performed on EPST and show significant improvement.4. A model of pitch-dependent spectral feature compensation against emotional speech variability is proposed. We have pointed out that the unaffected cepstral features would behavior more discriminative than the traditional ones. A preliminary work is elicited to use both long-term and short-term features to get the compensated ones. Novel strategy for selecting the compensating parameters is designed for GMM (Gaussian Mixture Model) training. Experiments are performed on both EPST and MASC database.This work is supported by National Natural Science Foundation of P. R. China (60273059), National Science Fund for Distinguished Young Scholars (60525202), Program for New Century Excellent Talents in University (NCET-04-0545) and Key Program of Natural Science Foundation of China (60533040).
Keywords/Search Tags:Speaker Recognition, Emotional Speech, Emotional Features, Frequency Fundamental (F0)
PDF Full Text Request
Related items