Font Size: a A A

Research On The Speaker Recognition System Under The Short Utterance Based On Deep Learning Theory

Posted on:2017-05-14Degree:MasterType:Thesis
Country:ChinaCandidate:H H LiFull Text:PDF
GTID:2308330485983792Subject:Automation
Abstract/Summary:PDF Full Text Request
Speaker recognition technology is an important branch of the speech recognition technology, belonging to the biometric authentication. Since the 1980 s, speaker recognition technology has been developing rapidly. In particular, the introduction of the Gaussian mixture made the technology get a great development and improvement at the theoretical level. When the utterance is sufficient in a quiet environment,recognition rate of the state art may be even more than 90%. However, in the practical application of the process, due to the complex surrounding environment is far from the ideal environment, the available speech databases are limited. The method basing of GMM theory relies heavily on the collected dates, therefore when the corpora data is insufficient, the performance of system declines seriously. Because of such problems, the original recognition technology is hard to promote and apply.Combined with the depth learning theory, in this thesis, we studied from two aspects feature extraction and speaker modeling for the poor performance of speaker recognition under short utterance. The main contributions of the thesis are as follows:From the perspective of feature extraction to solve the low accuracy problem of the speaker recognition system under short utterance using the traditional technology.In this paper, we used convolutional deep belief networks to extract high-level speech features from the spectrum of the original speech signal. The advantage of extracting deep feature from the original speech data is that it can avoid the lost of original speaker characters while form the features, therefore extracting the feature that make it easier to distinguish the speech belong to which speaker. Constructed CDBN specific model about TIMIT speech database on Matlab platform, extract the original speech spectrum. Train CDBN through the unsupervised pre-training and supervised discriminative training using the spectrum data. Finally using CDBN feature instead of the traditional MFCC feature, experiments basing on GMM-UBM, combined the MFCC features and CDBN features to get a new speech feature to calculate the result of recognition of different features of the model of the EER and analysis the systemperformance finally. The experiment improved, the CDBN feature is supervised than MFCC for the GMM-UBM speaker recognition system under short utterance. The proposed method solved the low accuracy problem in short speech recognition process effectively, is more discriminative features of speakers.From the perspective of model building to solve the low accuracy problem of the speaker recognition system under short utterance using the traditional technology. In the paper, based on the traditional MFCC feature, we used deep neural network as the backend recognition model of the recognition system, introduced the dropout policy to solve the problem of over-fitting during DNN training. Classify the original speech features based on the deep powerful nonlinear modeling capabilities of DNN. Then on Matlab platform using TIMIT speaker speech database, constructing the GMM-UBM and DNN speaker recognition system combined with MFCC, and calculated the EER of different models. The recognition results show that, recognition model established by the deep neural network can mining more discriminative features from the original limited MFCC features, and then describe the features distribution better, thus greatly improved the recognition accuracy of systems under short utterance. Simultaneously, the introduction of dropout further increased the recognition rate.
Keywords/Search Tags:speaker recognition, Short-term performance, deep learning, feature extraction, Gaussian mixture model, DNN
PDF Full Text Request
Related items