Font Size: a A A

Research On Key Techniques Of Wavelet-based Voiceprint Recognition

Posted on:2020-01-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:L LeiFull Text:PDF
GTID:1368330623958176Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Voiceprint recognition is a popular authentication technology,which refers to recognizing persons from their voice.Over the recent years,voiceprint recognition has been widely used in a wide range of applications such as access control,forensics,law enforcement,information service and so on.Recent voiceprint recognition models usually gain good performance in clean environment,but their performance drops rapidly in noisy environment.For noise robust voiceprint recognition,speech feature extraction and speaker modeling,which are two key technologies of voiceprint recognition,are investigated,and new speech feature extraction algorithms and speaker models,which are anti-noise,are proposed by combining the wavelet analysis and deep learning.The main distributions of this dissertation are shown as follows:(1)To improve the anti-noise performance of the cepstral features,a wavelet sub-band cepstral coefficients(WSCC)feature extraction algorithm is proposed.In the algorithm,speech signals are converted into a set of wavelet coefficients by using wavelet transform,and those wavelet coefficients are denoised by a threshold-base method.Afterward,WSCC feature vectors are calculated from the wavelet coefficients in each wavelet sub band.The noises in speech signals are significantly suppressed by the threshold-based method,due to the fact that the valuable information of speech is represented by large wavelet coefficients and the redundancies such as noises are figured by small wavelet coefficients.Thus,the anti-noise performance of the cepstral features is improved by the proposed algorithm.The experimental results show that the WSCC is more robust to noise than popular cepstral features.A WSCC-based voiceprint recognition model is also proposed,where different speakers' speech samples characterized by WSCCs are matched by probabilistic neural network(PNN)for recognition.The experiment results show the accuracy of the WSCC-PNN is 5% higher than the accuracy of voiceprint recognition models based on ceptral features in noisy environments.(2)To improve the wavelet packet transform(WPT)performance for speech analysis,a perceptual wavelet packet transform(PWPT)is proposed.The PWPT is constructed by pruning a 7-level WPT,according to the cochlea filer banks generated by the Greenwood model.Speech signals can be analyzed by the PWPT in effective way,owing to the cochlea filtering process of the PWPT which has capability of emphasizing valuable information and suppressing acoustic noise in speech signals.Experimental results show that the PWPT is more capable to analyze speech signal than the WPT,and its computational cost is 25% of the WPT's.A perceptual wavelet packet entropy(PWPE)feature extraction method is also proposed,where utterances are decomposed into a group of wavelet sub-band signals using PWPT,and then a entropy feature is extracted from each sub-band signal which is denoised by a threshold-based method.The Experimental results show that,in noisy environments,the accuracy of PWPE-based voiceprint recognition model is of 6% higher than the accuracy of the voiceprint recognition models based on wavelet packet features.(3)To enhance the anti-noise performance of the MFCC-based I-vector model(MIv),two new types of I-vector models: PWPE-based I-vector(PIv)and WSCC-based I-vector(WIv)are proposed.In their generation algorithms,I-vectors are generated from the PWPE and WSCC feature space of utterances,respectively.The PIv and WIv are insensitive to noises,owing to the PWPE and WSCC extraction methods where the noises are suppressed in multi-scale wavelet domains.The experimental results show that PIv and WIv are more robust to noise than the MIv.Two robust voiceprint recognition models are proposed as well,where different speakers characterized by PIv or WIv are compared by the cosine distance scoring(CDS)to yield recognition result.The experimental results show that accuracy of the proposed models is 8% higher than the accuracy of the MIv-based voiceprint recognition models in noisy environments.(4)To reduce the computational cost of the DNN-UBM,a convolutional neural network(CNN)based background model(CNN-UBM)is proposed,where background model is implemented by a CNN structure.The reliable posteriors can be fast estimated by the CNN-UBM,owing to the CNN structure which possess strong capability of modeling complex data and where few the weights are contained and the ReLU functions are used as activations.The experimental results show that CNN-UBM obtains equal performance to the DNN-UBM,but its computational cost is 12% of the DNN-UBM's.A CNN-UBM based I-vector(CNN/I-vector)model is also proposed,where I-vectors are generated from PWPE feature spaces of utterances based on such posteriors estimated by the CNN-UBM.The experimental results show the accuracy of the CNN/I-vector based voiceprint recognition model is 9% higher than the accuracy of DNN/I-vector based voiceprint recognition models in noisy environments.
Keywords/Search Tags:voiceprint recognition, wavelet analysis, speech feature extraction, speaker model
PDF Full Text Request
Related items