Font Size: a A A

Research On Speaker Recognition Based On Discriminative Feature Learning

Posted on:2021-05-27Degree:MasterType:Thesis
Country:ChinaCandidate:D L HuangFull Text:PDF
GTID:2428330623979539Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Speaker recognition is a very important task in biometrics technology,with the unique advantages of remote verification,and simple access and low cost.Therefore,the application range of speaker recognition technology is very wide,and it can be applied to almost every corner of people's daily life.Such as public security and justice,communications,defense and military,banking systems,Internet and other fields.At present,existing speaker recognition technologies usually learn relevant speech features of a specific speaker from a large number of training samples.How to extract the features that can distinguish the identity information of different speakers from the speech signal is a key factor that directly affects the recognition performance.In recent years,there have been more and more speech data sets recorded in complex environments,and the scale of the data sets has become larger and larger,and even speaker recognition speech databases of hundreds of thousands and millions of speeches have appeared.Under the complicated large-scale speech dataset,the learning ability of the traditional model is very limited compared with the deep learning model,which leads to a sharp decline in recognition performance.Based on the above challenges,a latent discriminative speaker feature learning method based on dictionary learning and a multi-frequency information convolution speaker recognition method based on margin angular loss are proposed respectively.The main contents and innovations of the thesis are as follows:(1)A latent discriminative speaker features learning method based on dictionary learning is proposed.This method utilize dictionary learning to build a latent feature mapping space,finds the correlation between different speech signals from the same speaker through the speaker embedded lookup table,and introduces the reconstruction constraint learning linear mapping matrix.So that the latent features learned from a large number of speech signals not only discriminative but also relevant within the same speaker.The experimental results show that under two different experimental settings on the TIMIT dataset,compared with the two state-of-the-art methods,respectively.Accuracy of the latent discriminative speaker features learning method based on dictionary learning is improved by 2.38% and 3.12%,respectively.On the Apollo of The Fearless Challenge dataset of INTERSPEECH2019,the accuracy obtained by our method on the development set and evaluation set is 32.98% and 36.33% higher than the baseline,respectively.(2)A multi-frequency information convolution speaker recognition method based on margin angular loss is proposed.We propose a new loss function,called margin angular loss,which can directly map the features of the speaker to the hypersphere.On the hypersphere,the angle directly corresponds to the similarity measure of the speaker.In addition,a new convolution operation called Octave is used to build a deep convolutional network feature extractor which can separate the high-frequency information and the low-frequency information independently from the speech spectrum.The feature extractor can capture the speaker's characteristic energy accumulation from different frequency information.Then the discriminative speaker features are learned from the margin angular loss for speaker recognition.Experimental results on the TIMIT dataset and the VoxCeleb dataset show that the proposed method performs well in accuracy and fully proves the effectiveness of the proposed method.(3)Design and implement a speaker recognition prototype system.We use Python,Matlab and deep learning framework called Pytorch to design and implement a speaker recognition prototype system based on the discriminative features learning.The system includes four modules: preprocessing of speech signals,feature extraction,speaker model matching,and result output.Among them,the proposed method based on latent discriminative speaker features learning method based on dictionary learning and the multi-frequency information convolution speaker recognition method based on margin angular loss are implemented in the prototype system.The effectiveness and practicability of the proposed methods are demonstrated and verified by the realization of the prototype system.
Keywords/Search Tags:speaker recognition, dictionary learning, latent discriminative feature, octave convolution, margin angular loss, deep learning
PDF Full Text Request
Related items