Font Size: a A A

Research On Whispered Speaker Identification In Channel Mismatch Conditions

Posted on:2012-10-18Degree:MasterType:Thesis
Country:ChinaCandidate:X J GuFull Text:PDF
GTID:2218330368492368Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
The whispered speech is acted as an auxiliary way of communication and it is widely used in human life at the same time, especially in the all kinds of identity recognition of finance area and justice area. Speaker usually can use whispered speech in order to keep information secret.So, the whispered speaker identification is also noticed as a new project. The whispered speech is often used in mobile phone environment, which is affected by channel distortion. The traditional model gets low recognition accuracy when the channel environment difference between training and testing is obvious. Therefore, a robust channel compensation algorithm must enhance the speaker recognition system. In order to solve this problem, the article's work is as follows:1. Mix all the kinds of channel whispered speech to train a universal background model (UBM), then on this base, maximum a posteriori adaptation is adopted to train the speaker model. Compare this model with GMM, the experiment result proves that the UBM performs better than normal GMM.2. Joint factor analysis (JFA) is introduced in whispered speaker identification. According the speech database's characteristic, decoupled estimation and omitting residual subspace are applied. In the specific identification process, the speaker factor from training utterance and channel factor from testing utterance are combined to fit the test channel dynamically. The experiment shows that improvement JFA achieves high recognition result. In addition, JFA is not ideal in the short-time identification. A new hybrid compensation method which keeps speaker factor in model domain and applies channel factor in feature domain is proposed. This method is to compensate each frame feature vector and more meticulous than JFA. The experiment shows 1s and 2s average identification rate separately improve 4.36% and 3.89% when HH channel is trained. In addition, EP channel separately improve 4.14% and 2.64%.3. According to support vector machine (SVM)'s discriminability, the speaker supervector is input into the SVM. But the system performance is not as good as UBM-MAP. Then the speaker factor vector is input into the SVM. Because the speaker factor has the property of low dimension and linear discriminant availability, it achieves excellent accuracy result. After that, three kinds of channel compensation technique are used to improve the system's robustness further and obtain quite identification result compared to JFA.
Keywords/Search Tags:whispered speech, speaker identification, joint factor analysis, hybrid compensation, support vector machine
PDF Full Text Request
Related items