Font Size: a A A

Research On Speaker Clustering And Identification Based On Deep Convolutional Network

Posted on:2022-10-03Degree:MasterType:Thesis
Country:ChinaCandidate:W C WangFull Text:PDF
GTID:2518306569978969Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the wide application of the acquiring devices,mobile Internet and cloud storage platform,there has been an explosive increase in speech data.However,there are few speech data labeled accurately by human due to the high cost of manual annotation.How to perform speaker clustering on a large number of unlabeled speech and how to do speaker identification on a small number of labeled speech are the research hotspots in the field of intelligent speech processing.This thesis investigates the problems of speaker clustering and speaker identification based on deep convolutional network.The main work and contributions of this thesis are as follows:(1)We propose a speaker clustering method by co-optimizing both deep representation learning and cluster estimation.First,Mel frequency cepstral coefficient(MFCC)is extracted from each speech sample,and I-vector feature is further extracted.Then,I-vector is taken as the input feature of deep convolutional autoencoder network(DCAN),and deep representation is extracted from the output layer of DCAN's encoder.Then,agglomerative hierarchical clustering(AHC)is used to cluster the deep representation,and the initialization cluster labels are obtained according to the clustering results.Finally,a Softmax layer is stacked on the output layer of DCAN's encoder to estimate clusters.The DCAN parameters are fine-tuned using a joint loss function for deep representation learning and cluster estimation.The normalized mutual information(MMI)and clustering accuracy(CA)are used as performance metrics.Evaluated on the speech corpora of Aishell-2 and Voxceleb1,the NMI scores obtained by the proposed method are 92.5% and 72.4% respectively,and the CA scores are 84.5% and 66.1% respectively.The proposed method is better than the mainstream methods.(2)We propose a few-shot speaker identification method based on a deep separable convolutional network with attention.The network is built by stacking some deep separable convolution(DSC)modules which are used to overcome overfitting problem.Meanwhile,the channel attention(CA)mechanism is combined to make full use of the information of each channel to improve the performance of the network.The proposed method achieves the accuracies of 94.46%,86.42% and 89.24%,and the F values of 96.18%,88.74% and 90.62%,on the speech corpora of Aishell-2,Voxceleb1 and TORGO,respectively.The method in this thesis outperforms other few-shot learning methods.In conclusion,this thesis focuses on the problem of speaker clustering with co-optimization and the problem of few-shot speaker identification and proposes a method of speaker clustering and a method of speaker identification which are based on deep convolutional network.This thesis designs different contrast experiments and compares the proposed methods with the mainstream methods to verify the effectiveness of the proposed methods.
Keywords/Search Tags:Deep convolutional network, Few-shot learning, Speaker identification, Speaker clustering
PDF Full Text Request
Related items