Font Size: a A A

Research On Speaker Recognition Clustering Algorithm

Posted on:2022-08-03Degree:MasterType:Thesis
Country:ChinaCandidate:B L LiFull Text:PDF
GTID:2518306524990889Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the progress of science and technology and the development of artificial intelligence,the internet and information technology are widely used in our life.Research on Clustering Algorithm Based on Speaker Recognition is a rising research direction of speech signal processing.Its task is to recognize the speaker boundary and speaker identity in the voice audio files of multiple speakers speaking in time-sharing,so that the same speaker audio is marked as the same classes,and each class contains only one speaker.Speaker recognition clustering usually uses speaker embedding vectors to aggregate audio segments,such as i-vectors.In recent years,due to the fast-growing of deep learning in various domains,embedding vectors(d-vector)based on deep neural networks have also been rapidly developed in this field,but they still need to be improved.Combining the GMM vector obtained by the GMM-UBM model and the clustering algorithm as a baseline system comparison,this paper proposes a speaker recognition clustering algorithm based on a-vector as the speaker embedding vector,and makes research on speaker embedding feature extraction method and speaker clustering algorithm.This article mainly includes the following research contents:Firstly,aiming at the problem of over-processing of MFCC features after dimensionality reduction in the current speech recognition field,the Mel spectrogram features will retain more voice information,which is more suitable as input features of CNN.Secondly,aiming at the problem that the speaker feature extraction network will ignore the correlation of the global speech frame,this paper proposes an a-vector extraction method based on the multi-head attention mechanism.A CNN-based speaker feature extraction network is built.In order to obtain better results,the speaker feature extraction network is modified based on Resnet.This article introducing a multi-head attention structure and modifying the cross-loss function in the network,so as to obtain the weight matrix of different feature maps,and enhance the degree of discrimination of speaker features in speech.In the same data,the recognition rate of the improved model based on Resnet is 3% higher than that of the CNN.The results show that the improved speaker feature extraction network based on Resnet can obtain better quality speaker embedding.Thirdly,the traditional clustering algorithm affects the clustering quality due to the selection of parameters,the distribution characteristics of data points and the large distance between cluster centroid.An improved speaker clustering algorithm based on eigenvalue gaps speaker spectrum is proposed.It can automatically estimate the number of clusters,and achieve higher clustering quality in any distributed data space;this article optimizes the similarity matrix in spectral clustering to obtain the number of clusters and cluster centroids,so as to better Identify the number and classification of speakers.Experimental results show that the improved clustering algorithm is more effective.Fourthly,combinations of both speaker feature extraction and speaker clustering modules,a speaker recognition clustering system is built.The article combines different embedding vectors with different clustering algorithms for experiments.Under the same data set,when a-vector is combined with the improved speaker spectrum clustering algorithm,the error rate is lower than that of the baseline system.
Keywords/Search Tags:speaker embedding, convolutional neural network, attention mechanism, spectral clustering
PDF Full Text Request
Related items