Font Size: a A A

Research On Robust Speaker Recognition Under Noisy Conditions

Posted on:2019-12-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:M H WangFull Text:PDF
GTID:1368330602961105Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Among various biometric identification technologies,speaker recognition has its predominance in the practical.The speaker recognition technology based on high-quality speech signals in a quiet laboratory environment has been relatively mature and achieved high recognition rate.However,the speaker recognition system often works in a variety of complex environments and is still facing many great challenges.The first one is noise interference.The noise can result in the mismatch between training features and testing features,which seriously affects the performance of speaker recognition systems.Therefore,the research on its robustness has become a very important research hotspot.This paper focuses on the robust speaker recognition technology with noisy speech and its main contributions are as follows:Firstly,a voice activity detection approach based on Fisher linear discriminant analysis is presented.Voice activity detection is a key technology in speech processing and speaker recognition.The traditional approaches eannot effectively detect noisy unvoiced sounds.To address it,the proposed method F-MFCC regards the unvoiced signal and background noise as a two classification problem and employs the Fisher criterion to pursue an optimal projection vector.As a result,it can minimize within-elass scatter and maximize between-class scatter,thus enhancing the separability between consonants and background noise.Experimental results demonstrate the effectiveness of F-MFCC on kinds of speech datasets.It outperforms the traditional AMR-1,G729B,PD,SS-AE-VAD,and MFCC-Similarity method and its error rate is 13.1%lower than AMR-1 on average.Secondly,an I-vector based speaker recognition method with local weighted linear discriminant analysis is presented.Noise can be divided into channel noise and background noise.The traditional i-vector based speaker recognition methods can not guarantee the optimum separation near the target,which leads to a small score difference between the target speaker and its adjacent neighbors and results in the decline of recognition accuracy.To solve this problem,the proposed method increases the weights of samples near the target in the calculation of between-class scatter and within-class scatter.Then,the discrimination ability is enhanced for the target,focusing on reducing recognition errors caused by channel noise.Various experiments on the NUST603 corpus demonstrate the proposed method LWLDA achieves higher robustness under complex channel noisy conditions.Compared with the multi-condition system,LWLDA improves the accuracy by an average of 3.6%and relatively reduces the recognition error rate by 19.5%.Thirdly,a speech feature extracting method based on robust principal component analysis is proposed.By employing the robust principal component analysis,the speech is separated into the noise spectrum(regarded as the low-rank component)and the speech spectrum(regarded as the sparse component)in the short-time Fourier transform domain.Then feature is extracted on the sparse component without inverse STFT and smoothness.So,it ean avoid the destruction of the speaker's personality information,and effectively improves the performance of the speaker verification systems.Compared with the multi-condition system,the proposed method RPCA-TVS reduces the equal error rate by 4.7%on the whole.Fourthly,a speech enhancement based on an improved non-negative matrix factorization is proposed.Traditional approaches may cause speech distortion while improving the signal to noise ratio of the speech signal.To address it,the proposed method generates speech dictionary using the spectra of pitch and their harmonics via mathematical model,which can guarantee the purity of speech dictionary.In addition,to alleviate the loss of the information of the noise sample,the proposed method generates the noise dictionary by means of a linear combination of the spectrum frames separated online.The experiment results on unknown(unseen)and unstable noises demonstrate that the proposed method ImNMF achieves significant improvement of robustness under various noise conditions.Particularly,it reduces the equal error rate by an average of 4.6%,comparing with the base-line.Finally,we design mnd generate a noisy dataset based on the TIMIT and NUST603 corpora.A noisy corpus is very important for the speaker recognition research.However,the corpora can be available are mainly recorded for the speech recognition,not for speaker recognition system.To solve this problem,a noisy dataset which contains pure speech,channel-noisy speech,and background-noisy speech is built and evaluated from the aspects of pitch range,channel mismatch,noise coverage,signal to noise ratio,and distortion.It is proved that the generated noisy dataset is representative and suitable for research and testing tasks of speaker recognition.
Keywords/Search Tags:speaker recognition, convolutional noise, additive noise, voice activity detection, i-vector, linear discriminant analysis, robust principal component analysis, nonnegative matrix factorization, Gaussian probabilistic linear discriminant analysis
PDF Full Text Request
Related items