Font Size: a A A

Research Of Speaker Recognition In Low-SNR Environment

Posted on:2017-10-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:D WuFull Text:PDF
GTID:1318330512457542Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Speaker recognition is a biology recognition technology using speech signal. In real applications, the performance of speaker recognition system decreases in noisy environment. Speaker recognition is developing into business application, and is becoming research hotspots, especially in strong noisy environment. However, most of existing algorithms' performances decrease rapidly in low-SNR environment, which makes new challenges to speaker recognition.Currently, there are three key problems to be resolved in low-SNR environment. 1. The robustness of the existing features decreases rapidly in low-SNR environment, which cannot meet the requirements of speaker recognition. 2. In low-SNR environment, the performance of the feature compensation methods drops, and the robustness of features cannot be effectively improved. 3. The existing speaker recognition models can only fit in a certain class of noise, or the performance decreases in low-SNR environment. Based on thorough research on speaker recognition in noisy environment, the robust feature, the feature compensation and the recognition model are studied in this work upon the three above problems to be resolved. The main research contents and innovations are as follows.1. According to the decreasing performance of speaker recognition features in low-SNR environment, a feature extraction method of Perception Spectrogram Norm Cochlea Filter Cepstral Coefficient is proposed. The cochlea filter banks are constructed complying with traveling wave response and nonlinear frequency distribution of basilar membrane. A Perception Spectrogram Norm parameter is carried out by two-dimensional edge detection using a hearing perception speech enhancement and a two-dimensional enhancement in time-frequency domain. The output of the cochlea filter banks is normalized to Perception Spectrogram Norm Cochlea Filter Cepstral Coefficient, which has better robustness in time-frequency domain. Experimental results show that, the average recognition rates of the proposed feature increase 26.6%, 22.2%, and 18.5% respectively in all of the test noisy environments and the test SNRs. And the highest recognition rates are achieved by the proposed method in all conditions with SNR ranged from-10 dB to 10 dB. In low-SNR environment, the proposed Perception Spectrogram Norm Cochlea Filter Cepstral Coefficient indicates better robustness to different kinds of noise.2. For the decreasing robustness of feature compensation methods of speaker recognition in low-SNR environment, a feature compensation method based on Perception Auditory Scene Analysis is proposed. Missing data features spectrum is calculated, and perception speech content is solved by speech perception characteristic. Speech distribution is obtained from noisy speech after speech enhancement based on auditory perceptual characteristic and two-dimension enhancement for spectrogram, which is combined with perception speech content and missing intensity parameter to extract Perception Auditory Factor. Perception Auditory Factor and missing data feature spectrum resolve the feature extraction process into different auditory scenes, which are treated respectively in order to improve robustness of speaker recognition system. Experimental results show that, the proposed method improves the robustness to other five methods in four different noisy low-SNR environments from-10 dB to 10 dB. The average recognition rates of the proposed method increase 26.0%, 19.6%, 12.7%, 4.6% and 6.5% respectively. The proposed method is to improve the robustness of features in time-frequency domain, and more suitable for speaker recognition in low-SNR environment.3. According to the robustness of speaker recognition models degrades in low-SNR environment, a Mixture-Condition Noise Field Model is proposed. A serial of color noises from White noise to Brown noise are constructed by fractional order transfer function, which takes White noise as basic noise and take Pink noise as direction noise. A Mixture-Condition Noise Field is constructed with various noise conditions and various SNR conditions by adding noise to training speech with different SNR. Speech from every speaker is modeled into a Mixture-Condition Noise Field Model. The matching Mixture-Condition Noise Field Model is carried out from every speaker recognition model in the Mixture-Condition Noise Field Models, and then, the speaker is recognized from all speakers. Experimental results show that, the average recognition rates of the proposed model, in four different noisy low-SNR environments from-10 dB to 10 dB, increase 42.7%, 32.2% and 21.1% respectively compared with the baseline model and the other two reference models. The proposed speaker recognition model is more suitable in low-SNR environment.
Keywords/Search Tags:speaker recognition, noise, low-SNR environment, Perception Spectrogram Norm Cochlea Filter Cepstral Coefficient, Perception Auditory Scene Analysis, Mixture-Condition Noise Field Model
PDF Full Text Request
Related items