Font Size: a A A

Research On Speaker Recognition And Clustering For Convolutional Neural Networks

Posted on:2019-05-13Degree:MasterType:Thesis
Country:ChinaCandidate:H Z MuFull Text:PDF
GTID:2428330593950433Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The speaker recognition problem is to determine who is speaking by voice features.In recent years,most scholars still use traditional methods to extract speech features,such as the Mel frequency cepstral coefficient,referred to as MFCC,but because the real environment is more complex and greatly different from the experimental environment,the results obtained are not satisfactory.Therefore,it is very necessary to find a new method to extract speech features and achieve better practical results.With the continuous development of artificial intelligence technology,the use of deep learning methods enables artificial intelligence technology to be quickly applied to many fields such as image,text,and speech recognition.Specifically,it has gradually formed an abstract feature using big data and abstractions.The features are all automatic extraction methods.Among them,the development of convolutional neural networks has made the study of deep learning reach a new height.The problem of speaker recognition through the feature extraction of convolutional neural networks has also received extensive attention.The speaker recognition method has the GMM-UBM model,which has achieved good results in practical applications,but it still has two main shortcomings:(1)The model uses the EM algorithm for iterative training,which is complex in structure.The training time is long,and a large amount of memory is required,and the generalization ability is general;(2)the model has strict requirements on the data,so a corresponding method is needed to generate the specified data format.Aiming at the problem of speaker recognition,this paper proposes a speaker recognition and clustering model based on convolutional neural network.The model is divided into two parts.The first part is to use the spectrum of sound as the input of convolutional neural network.The network is studied and optimized for speaker recognition.The second part is based on the speaker recognition model to extract relevant features for clustering of unknown speakers.In order to improve the model effect of speaker recognition,in the process of generating the spectrogram,the512-dimensional voiceprint feature is used to generate the spectrum map,and in the silent detection process of the sound,the dynamic threshold is used to process the silent region;The Dropout and block normalization layers were added to the network design,and the speaker model effects of different layers were studied.In addition,in order to verify the robustness of the speaker recognition model,the model effects of different numbers of people were compared.For the speaker clustering model,principal component analysis and adaptive propagation clustering were also selectedfor visual analysis.92% of the test accuracy was obtained on the dataset TIMIT,and the unknown speaker clustering also achieved comparable results.
Keywords/Search Tags:Speaker Recognition, Convolutional Neural Network, Speaker Spectrum
PDF Full Text Request
Related items