Font Size: a A A

Research On Robust Speaker Identification Under The Inflextion Environment

Posted on:2016-07-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y ShanFull Text:PDF
GTID:2308330473465537Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the continuous improvement of computer technology and gradual thorough of the social informatization level, the application field of speaker recognition is wider, while the expectation for its friendliness, accuracy and robustness is also increased. Under the laboratory environment, speaker recognition has made great progress. But in real life, the performance of speaker recognition system is vulnerable to various environmental factors. Speech contaminated ambient noise is one of aspects resulting in speaker recognition rate drops, speaker’s sound changed for some reason is another aspect, more common is the physical health(such as colds). The above various factors are main reasons resulting in the current speaker recognition system robustness is poor. In the study of speech recognition robustness technology, the first case that the effects of environmental noise has been widely attention, the researchers take various measures to reduce the noise influence and improve the recognition rate. But for the latter, the speaker’s voice itself research, is very lack. This thesis focuses on the research how to improve the robustness of speaker recognition system under the inflextion environment. There are also many reasons leading to the voice changed, some is deliberately changing their voice(such as the criminal disguised himself), more because of the speaker health. This thesis mainly studies the latter, focusing on the inflextion by common colds. cold speech is defined as voice by persons who are catching a cold. Colds change the speaker personality characteristics, so as to make the accuracy of system recognition drop significantly.The main research work and innovation of this thesis are as follows:For only normal speech is taken to train speaker model, this thesis analyzes the changes of vocal organs caused by colds, the characteristics of the nasal and differences between cold speech and normal speech. Focusing on how to compensate the voice changes caused by colds and improve the performance of the system. The specific work includes:(1)Analyzed the change of the nasal passages and its modulation effect caused by colds, studied the spectrum characteristics of the nasal and compared the spectrum of cold speech with normal speech’s. Adopted one different pre-emphasis filter with the one processing normal speech to deal with cold speech, this filter has the characteristics that low frequency attenuation amplitude is bigger, the high frequency ascension is better. Simulation experiment was carried out on the basis of corpus recorded in the pronunciation lab, lots of experiments show that recognition rate of the system is the best when using the classic pre-emphasis filter with coefficient is 0.91 to deal with normal speech, and using the special pre-emphasis filter with coefficient is 1? ?0.98, ? ?0.8 to process cold speech.(2)Proposed that using linear prediction coefficient and Mel cepstrum coefficient to train the GMM speaker model respectively, and then taking a score normalization method to process two scores from two systems, furthermore two outputs are linear weighted process and ruling out the target speaker. This method effectively uses the complementarity of scores from different feature parameter. Experiments prove that the LPC and MFCC score fusion systems is better than that of single system.(3) For the popularity of intelligent mobile terminal, many users are required authentication scenarios when they access networks via mobile terminal, in order to reduce the amount of data transmission and processing, this thesis proposed cold speech speaker recognition system based on compressed sensing, proposed that excluding silent voice and unvoiced frames by setting short-term energy thresholds before extracting cold speech’s CS-MFCC, to reduce its impact on the waveform which is projected by line ladder matrix. Ensuring the recognition rate of speaker recognition system, but also can greatly reduce the amount of data of the speech signal(nearly one-eighth).
Keywords/Search Tags:speaker identification, cold speech, pre-emphasis filter, score combination, score normalization, compressed sensing
PDF Full Text Request
Related items