Font Size: a A A

Research On Pitch Mismatch And Its Compensation Methods In Emotional Speaker Recognition

Posted on:2012-12-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:T HuangFull Text:PDF
GTID:1118330371958965Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of computer technology and the speeding of social informationization progress, speaker recognition (SR) has been applied to more and more fields, while more attention is paid to its accuracy and robustness. In real application, emotion variation is one of the most common factors, which may influence the performance of a speaker recognition system. In this thesis, the SR with emotional training or testing utterance is called emotional speaker recognition. Different emotion states will influence speech production mechanism, which will lead to statistical mismatch of training and testing speech. Emotion variation between training and testing, so called "emotion mismatch", introduces negative effect to speaker recognition.Due to the consideration of the amicability, the system is usually trained with only netural speech, while tested on various emotional speech. To eliminate the emotion mismatch in this case, we study the influence of emotion variability and emotion-induced pitch vaiability, and present several effective solutions. The main contributions of this thesis are as following:1. Propose a new kind of emotional speech division and its classification methodDue to the unreliable performance of speech emotion recognition and the importance of emotion detection, an emotional speech division according to model mismatch is proposed and only the speech with high model mismatch is detected by its classification method. In this thesis, we study the differences between the speech of various emotions and divide the emotional speech in MASC into two groups:HD (High Deviation from neutral, including anger, elation and panic) and LD (Low Deviation from neutral, including sadness) group. To distinguish these two groups, a method which combines MFCC and prosodic features is proposed.2. Provide a thorough survey of emotion-induced pitch mismatchThe study of the mechanism of vocal emotion expression shows the production of pitch mismatch; the study of the vocal source-tract interaction, the relationship between pitch and MFCC, and the relationship between pitch mismatch and speaker identification rate, shows that model mismatch is closely related to pitch mismatch; Experiment shows that model mismatch can be reduced by pitch modification.3. Present a thorough study of pitch mismatch based emotion compensationBased on the reaseach above, this thesis constructs the framework of pitch mismatch based emotion compensation method and proposes four efficient solutions:1) Pitch based mismatch dectection and modification methodExperiment results show that the HD speech with high pitch mean is more difficult to be identified than the speech with low pitch mean. According to this observation, we propose a pitch based high mismatch dectection method. By puring or scaling the feature of the high mismatch part of HD speech, the system reduces the influence caused by emotion mismatch.2) Pitch transformation based Bi-model construction methodA virtual HD model for each target speaker is built from the virtual speech, which are converted from the neutral speech by the pitch transformation algorithm. Combining the virtual HD model with the neutral model, the proposed method improves the performance of SR system.3) Pitch normalization based emotion-neutral transformation methodIt has been showed that HD emotional speech are quite different from the neutral speech on both MFCC and pitch. Based on the hypothesis of source/vocal-tract interaction, this method attempts to resolve the vocal tract feature (MFCC) distortion for HD group utterance by converting its pitch mean to approach the neutral one.4) Pitch related model mismatch detection based score compensation methodBased on the research on the relationship between pitch mean bias and speaker recognition rate for HD emotion utterances, we uniquely utilize the recognition rate related to the pitch mean of the mismatch part as its frame reliability factors to accumulate the frame likelihood.
Keywords/Search Tags:Emotional Speaker Recognition, Pitch Mismatch, Emotional Speech, Emotion Masking, Emotional Speech Synthesis, Emotion Normalization
PDF Full Text Request
Related items