The Study Of Speaker Segmentation And Clustering Of Multi-person Conversation

Posted on:2018-04-15

Degree:Master

Type:Thesis

Country:China

Candidate:W X Zhu

Full Text:PDF

GTID:2348330512485625

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

Speaker segmentation and clustering is a technique that automatically divides the continuous speech of multi-person,obtaining the information of " Who Spoken When"Currently,the speaker segmentation and clustering system has achieved very good performance in the two-person telephone conversation speech.However,there are still many challenges in the complex scenes such as conference and television broadcast-ing.The existing problems include:the number of speakers generally is not fixed and a priori information about the number of speakers is unavailable;short conversation turns is often encountered,the length of each speaker's speech is variable;there exists a variety of noise in the speech and so on.How to solve these problems effectively and improve the robustness of the system becomes an important research direction,and it is also the main research content of this paper.This paper studies the problem of speaker segmentation and clustering in multi-person talk shows of television.The main work and innovation of this paper are as follows:Firstly,system fusion to improvement the mainstream algorithm.In the second chapter of the paper,on the basis of the typical segmentation and clustering algorithm,the Deep Neural Network(DNN)is used to replace the traditional Bayesian Information method to realize the segmentation of speech.And the accuracy of the change point detection is improved.In the aspect of clustering,the method of consensus clustering is used to fuse the multiple sets of systems,improve the purity of cluster and the robustness of the initial model,consequently reduce the system error rate.Secondly,feature denoising in the noise environment.In the third chapter of the paper,the Regression Deep Neural Network(Regression DNN)is used to fit the map-ping function of acoustic features of the noisy audio to the acoustic features of the clean audio,and using this regression network to extract the noise reduction features in the interest of weaken the noise information,then apply the noise reduction feature to the segmentation and clustering system,accordingly,reducing the system error rate.Fur-thermore,the use of consensus clustering to fuse the noise reduction feature and the original feature system,significantly improve the performance of the system.Thirdly,length robust of cluster model training algorithm.In the multi-speaker scenario,the length of each speaker's speech is variable.In chapter 4 of the paper,aiming at the problem that the model obtained by conventional Maximum A Posteriori(MAP)algorithm is affected by the influence of the length of the cluster,to solve this problem,in the process of MAP,the relative factor is adjusted according to the length of the cluster,and the length robustness of the cluster model parameters is improved.The experimental results showthat in the Normalized Cross Likelihood Ratio(NCLR)and T-Test measurement distance,the normalized cluster model brings about the improvement of the performance.Fourthly,high discriminant speaker number determination algorithm.In the fifth chapter of the paper,our research focus on determining speaker number in speech.Based on the threshold method,the Ts criterion is used to determine the speaker number in speech.This criterion does not need to set a threshold in the development set.The experimental results show that the combination of Ts criterion and threshold method improves the accuracy of determining speaker number in speech.In addition,combin-ing the mean of the estimated distribution of within-cluster distance and inter-cluster distance with the threshold of development set,then set the adaptive threshold,which improve the accuracy of determining speaker number in speech.Finally,an improved T-Test metric distance is proposed,the improved method utilizes the statistical infor-mation of the likelihood ratio score distribution in detail,which is more discriminative and thus more accurate in the speaker number determination.

Keywords/Search Tags:

Speaker segmentation and clustering, Consensus clustering, Regression Deep Neural Network, Length normalization MAP algorithm, T_s criterion, Improved T-Test metric distance

PDF Full Text Request

Related items

1	Research On Improved Speaker Segmentation And Clustering Algorithm
2	Research On Distance Margin-based Deep Discriminative Clustering Methods
3	Implementation Of Hotspot Clustering Method Based On Improved Tangent Space Distance Metric
4	The Study And Application Of New Clustering Algorithms In Image Processing And Text Clustering
5	Research On Topology Relation-based Distance Metric And Clustering Algorithms
6	Research On Key Techniques Of Speaker Recognition In Network
7	Research On Speaker Recognition Clustering Algorithm
8	Research On Clustering Algorithm Based On Density And Manifold Distance
9	The Research And Application Of Improved Data Competition Clustering Algorithm
10	Speaker Recognition Technology Research