Research On Universal Background Model And Preliminary Study On Deep Learning In Speaker Recognition

Posted on:2020-07-05

Degree:Master

Type:Thesis

Country:China

Candidate:W X Mei

Full Text:PDF

GTID:2428330572496506

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Speaker recognition is an important research direction in the field of speech signal processing.The main purpose is to automatically extract the identity of the speaker from the voice,which is widely used in the fields of banking,public security systems and smart home.Currently,mainstream algo-rithms are based on the probabilistic models.The GMM-UBM model has achieved good performance when the background corpus is sufficient and the channel is single.However,in practical applications,the noise and channel mismatch make the performance of this method drop dramatically.The i-vector method was proposed to solve these problems to some extent.The above algorithms are all based on GMM-UBM,and have achieved good results in NIST evaluation,but there still remain some problems that have not been solved.The main manifestations are:On the one hand,the training of the universal background speaker model requires a lot of com-puting resources and makes it difficult to deploy quickly in new environments;on the other hand,there is no further research on the theoretical basis of the universal background model training,only by collecting a large number of different speakers' data to fill the feature space as much as possibl.This paper focuses on the text-independent speaker verification and the universal background speaker model corpus selection problem.The main work and innovations are as follows:Firstly,speaker verification systems based on the GMM-UBM model and the i-vector/PLDA method are constructed.The preprocessing of features,the training method of UBM model,the pro-cess of MAP adaptation,the extraction method of i-vector global difference matrix and the scoring method based on PLDA are introduced in detail.The effects of GMM model order and MFCC fea-ture dimension on system performance are discussed.The experimental results show that the system constructed in this paper has reached the performance of mainstream open source implementation.Secondly,a N-support speaker selection algorithm based on GMM super-vector clustering is proposed.The core idea of the speaker selection algorithm is to make the difference of the selected speakers' speech feature distribution as large as possible to cover the entire feature space.Therefore,this paper proposes to use the data of each background speaker to train the GMM model separately,use the GMM super vector to approximate its feature distribution,and finally use the clustering al?gorithms to find the speaker set with the largest distance between each other.Experiments show that the algorithm can only use the 8.807%,8.6%and 4.3%of the benchmark speaker corpus on the three data sets of AISHELL,MASC and TIMIT to construct the UBM with baseline performance.Thirdly,the background speaker corpus selection algorithm based on GMM Token ratio is re-alized.Another idea for UBM data selection is to screen directly at the frame level.The current mainstream algorithm is the IFS(Intelligent Feature Selection)algorithm proposed by Hansen et al.,which dynamically estimates the probability distribution of Euclidean distance between background corpus frames.This paper changes the way of thinking,starting from GMM Token,which can reflect the characteristics of phonemes,proposes a background corpus selection algorithm based on Token's ratio.Experiments show that the algorithm can only build the UBM with baseline performance on the AISHELL,MASC and TIMIT datasets with only 18.1%,10.0%and 9.1%of the benclunark speaker corpus.Fourthly,a speaker identification system based on Mel spectrum and convolutional neural net-work is constructed.At present,the features of the mainstream speaker verification method are manual features such as MFCC.This paper proposes a speaker identification system based on convolutional neural network,which is directly used as the input of the system.The experimental results show that as the amount of training data increases,the performance of the system constructed in this pa-per gradually approaches and exceeds the traditional probability model.Specifically,on the MASC corpus,when the ratio of the training data to the test data is 8:2,the discrimination rate(IR)of the method reaches 90%;when the ratio reaches 9:1,the discrimination rate reaches 95.7%,exceeding The discrimination rate of the GMM-UBM system.

Keywords/Search Tags:

Speaker Recognition, Universal background Speaker Data Selection, GMM Token, Mel-spetorgram, ConvNets

PDF Full Text Request

Related items

1	Research On Background Model And Score Issues For Speaker Recognition
2	Research On Text-Independent Speaker Verification System
3	Research On Speaker Recognition Based On VPT And GMM
4	Research On Robust Speaker Recognition Technology Based On GMM-UBM
5	Complex Channel Speaker Recognition Technology
6	Studies On Speaker Recognition Based On SVM And GMM
7	Research On Technologies Of Speaker Recognition Based On Sparse Decomposition
8	Research On Speaker Recognition Over Short Utterance And Varying Channels
9	Research On Adaptive Methods For Text-independent Speaker Recognition
10	Research On The Discrimination Issue In Speaker Recognition