Font Size: a A A

The Study Of Speaker Segmentation And Clustering Of Multi-person Conversation

Posted on:2018-04-15Degree:MasterType:Thesis
Country:ChinaCandidate:W X ZhuFull Text:PDF
GTID:2348330512485625Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Speaker segmentation and clustering is a technique that automatically divides the continuous speech of multi-person,obtaining the information of " Who Spoken When"Currently,the speaker segmentation and clustering system has achieved very good performance in the two-person telephone conversation speech.However,there are still many challenges in the complex scenes such as conference and television broadcast-ing.The existing problems include:the number of speakers generally is not fixed and a priori information about the number of speakers is unavailable;short conversation turns is often encountered,the length of each speaker's speech is variable;there exists a variety of noise in the speech and so on.How to solve these problems effectively and improve the robustness of the system becomes an important research direction,and it is also the main research content of this paper.This paper studies the problem of speaker segmentation and clustering in multi-person talk shows of television.The main work and innovation of this paper are as follows:Firstly,system fusion to improvement the mainstream algorithm.In the second chapter of the paper,on the basis of the typical segmentation and clustering algorithm,the Deep Neural Network(DNN)is used to replace the traditional Bayesian Information method to realize the segmentation of speech.And the accuracy of the change point detection is improved.In the aspect of clustering,the method of consensus clustering is used to fuse the multiple sets of systems,improve the purity of cluster and the robustness of the initial model,consequently reduce the system error rate.Secondly,feature denoising in the noise environment.In the third chapter of the paper,the Regression Deep Neural Network(Regression DNN)is used to fit the map-ping function of acoustic features of the noisy audio to the acoustic features of the clean audio,and using this regression network to extract the noise reduction features in the interest of weaken the noise information,then apply the noise reduction feature to the segmentation and clustering system,accordingly,reducing the system error rate.Fur-thermore,the use of consensus clustering to fuse the noise reduction feature and the original feature system,significantly improve the performance of the system.Thirdly,length robust of cluster model training algorithm.In the multi-speaker scenario,the length of each speaker's speech is variable.In chapter 4 of the paper,aiming at the problem that the model obtained by conventional Maximum A Posteriori(MAP)algorithm is affected by the influence of the length of the cluster,to solve this problem,in the process of MAP,the relative factor is adjusted according to the length of the cluster,and the length robustness of the cluster model parameters is improved.The experimental results showthat in the Normalized Cross Likelihood Ratio(NCLR)and T-Test measurement distance,the normalized cluster model brings about the improvement of the performance.Fourthly,high discriminant speaker number determination algorithm.In the fifth chapter of the paper,our research focus on determining speaker number in speech.Based on the threshold method,the Ts criterion is used to determine the speaker number in speech.This criterion does not need to set a threshold in the development set.The experimental results show that the combination of Ts criterion and threshold method improves the accuracy of determining speaker number in speech.In addition,combin-ing the mean of the estimated distribution of within-cluster distance and inter-cluster distance with the threshold of development set,then set the adaptive threshold,which improve the accuracy of determining speaker number in speech.Finally,an improved T-Test metric distance is proposed,the improved method utilizes the statistical infor-mation of the likelihood ratio score distribution in detail,which is more discriminative and thus more accurate in the speaker number determination.
Keywords/Search Tags:Speaker segmentation and clustering, Consensus clustering, Regression Deep Neural Network, Length normalization MAP algorithm, T_s criterion, Improved T-Test metric distance
PDF Full Text Request
Related items