Research On Speaker Recognition And Clustering For Convolutional Neural Networks

Posted on:2019-05-13

Degree:Master

Type:Thesis

Country:China

Candidate:H Z Mu

Full Text:PDF

GTID:2428330593950433

Subject:Computer Science and Technology

Abstract/Summary:

The speaker recognition problem is to determine who is speaking by voice features.In recent years,most scholars still use traditional methods to extract speech features,such as the Mel frequency cepstral coefficient,referred to as MFCC,but because the real environment is more complex and greatly different from the experimental environment,the results obtained are not satisfactory.Therefore,it is very necessary to find a new method to extract speech features and achieve better practical results.With the continuous development of artificial intelligence technology,the use of deep learning methods enables artificial intelligence technology to be quickly applied to many fields such as image,text,and speech recognition.Specifically,it has gradually formed an abstract feature using big data and abstractions.The features are all automatic extraction methods.Among them,the development of convolutional neural networks has made the study of deep learning reach a new height.The problem of speaker recognition through the feature extraction of convolutional neural networks has also received extensive attention.The speaker recognition method has the GMM-UBM model,which has achieved good results in practical applications,but it still has two main shortcomings:(1)The model uses the EM algorithm for iterative training,which is complex in structure.The training time is long,and a large amount of memory is required,and the generalization ability is general;(2)the model has strict requirements on the data,so a corresponding method is needed to generate the specified data format.Aiming at the problem of speaker recognition,this paper proposes a speaker recognition and clustering model based on convolutional neural network.The model is divided into two parts.The first part is to use the spectrum of sound as the input of convolutional neural network.The network is studied and optimized for speaker recognition.The second part is based on the speaker recognition model to extract relevant features for clustering of unknown speakers.In order to improve the model effect of speaker recognition,in the process of generating the spectrogram,the512-dimensional voiceprint feature is used to generate the spectrum map,and in the silent detection process of the sound,the dynamic threshold is used to process the silent region;The Dropout and block normalization layers were added to the network design,and the speaker model effects of different layers were studied.In addition,in order to verify the robustness of the speaker recognition model,the model effects of different numbers of people were compared.For the speaker clustering model,principal component analysis and adaptive propagation clustering were also selectedfor visual analysis.92% of the test accuracy was obtained on the dataset TIMIT,and the unknown speaker clustering also achieved comparable results.

Keywords/Search Tags:

Speaker Recognition, Convolutional Neural Network, Speaker Spectrum

Related items

1	End-to-End Speaker Embedding For Speaker Recognition In The Wild
2	Research On Speaker Recognition Algorithm Based On Deep Convolutional Neural Network
3	Research On Speaker Recognition Method Based On Deep Learning
4	Research On Deep Learning Methods For Use With Speaker Recognition
5	Speaker Recognition Research Based On GMM Speaker Clustering Technology
6	Speaker Recognition Research Based On Array Neural Network
7	Studies On Speaker Recognition Based On SVM And GMM
8	Co-channel Speaker Recognition Based On Deep Learning
9	Research On Speaker Adaptation Of Neural Network Acoustic Models For Speech Recognition
10	Research On Speaker Recognition Over Short Utterance And Varying Channels