Font Size: a A A

Research On Emotional Speaker Recognition And Its Solutions

Posted on:2011-06-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z Y DanFull Text:PDF
GTID:1118330332978357Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Speaker recognition is the process of automatically recognizing the identification of the speaker by his/her special biometric information included in speech signals. As one of the most natural biometrics, it has extensive application foreground and development potential.The traditional speaker recognition methods have achieved the excellent performance, when using neutral utterances for the training and testing. However, it might aggravate the recognition performance, when training or testing utterance contains kinds of emotion speech. Different emotion states will affect speech production mechanism, thus lead to statistical mismatch of training and testing speech. Such kind of recognition is called emotional speaker recognition in this thesis. Based on the research of current advancements of emotional speaker recognition and the influence of emotion variability, this thesis addresses itself to the task of eliminating the mismatch between the training and testing utterance, and presents a framework of emotional speaker recognition system and several effective solutions. The main contributions of the work are as follows:1. Provided a deeper research on the emotional variabilityWe research the changes of pitch, frequency spectrum, formants, and the speaker special information on feature and model level, caused by the emotion variability. Plenty of experiments are carried out on the performance of machine and human listener on emotional speaker recognition, and speech check-in system in office application.2. Applied the channel compensation method in the emotional speaker recognitionBased on analysis the similarities and differences of the emotion, channel and noise factors, the methods for solving channel and noise is proposed to be used in emotional speaker recognition. Then, we evaluate the performance of NAP and LFA methods.3. Proposed two methods based on emotion-neutral model transformation function Emotion-neutral model transformation function is presented based on the experiment results of the relationship between emotion and neutral model. Then, two methods based on Gaussian and parameters are proposed to solve this function. According to these two methods, the speaker's neutral model can be transformed to emotion model. The system will get familiar with the distribution of his/her emotional speech when only neutral utterance is used for training.4. Proposed a frequency shifting methodA frequency shifting method changes the frequency of speech to synthesize the speech with kinds of emotion states. It can be combined with multi-condition model to improve the emotion robustness. The experiment result shows that the synthesized speech is more similar to the spontaneous emotion speech than the neutral one. The superiority of this method is that it can be easily applied in the traditional recognition systems.5. Proposed a scores selection methodThe scores selection method is suitable for the situation where the testing utterance is mixed with neutral and emotional speech. By this method, the speech frame can be distinguished from the testing utterance to reduce its emotion ratio. It is based on two conclusions:the verification performance improves as the emotion ratio decreases and the scores of neutral features against his/her model are distributed in the upper area than other three scores(neutral against the model of other speakers, and non-neutral speech against the model of himself/herself and other speakers).6. Proposed an UBM reduction method for effective emotional recognition systemThe MAP method is helpful to the emotional speaker recognition systems. However, the high-order universal background model needs complex computation, which is the limitation for the real application. We proposed a UBM reduction method to cluster the original UBM into a lower-order one to speed up the system.
Keywords/Search Tags:Speaker Recognition, Emotional Speaker Recognition, Emotional Speech, Emotion-neutral Model Transformation, Mixture Model Reduction
PDF Full Text Request
Related items