Font Size: a A A

Research On Speaker Segmentation And Clustering

Posted on:2013-11-09Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhangFull Text:PDF
GTID:2248330395980605Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Speaker segmentation and clustering determines “who spoke when?” by segmenting a audiorecording into different segments and annotating each segment with a speaker label. It has wideapplications in many fields and is a current research focus. This thesis mainly focuses onspeech/nonspeech detection and speaker segmentation and clustering algorithms, and the mainfindings are as follows:The model-based speech/nonspeech detection method needs a great quantity of training dataand its robustness is not strong. In order to solve these problems, a robust hierarchicalspeech/nonspeech detection method which adopts a hierarchical structure is presented in thisthesis. In the first layer, the test data is roughly classified into two classes using the model-basedmethod. In the second layer, firstly, training data is selected by computing the Short TimeEnergy(STE) and High Zero Crossing Rate Ratio(HZCRR) features of the roughly classifiedresult to establish silence and audible nonspeech initial models. Secondly, the speech model isestablished using the output of the Viterbi resegment. Thirdly, three adaptive detection modelsare trained iteratively to detect the speech and nonspeech and finally the results are correctedusing Bayesian Information Criterion(BIC). Experiments show that the hierarchical method ismore accurate and robust compared with the traditional model-based method.The performance of the speaker segmentation and clustering system usually degradesbecause of a lack of prior information about the speakers. To solve the problem, a novelapproach that combines the algorithms based on Information Bottleneck(IB) principle andHMM/GMM on the feature level using the complementarity of these two algorithms is given inthis thesis. After logarithmic transformation and Principal Component Analysis (PCA) to reduceinitial dimensionality, the output of the IB algorithm is then used to train the speaker GMMmodel. Along with the speaker GMM model trained by the traditional MFCC feature, the ΔBICscores between different speaker clusters are computed respectively and then combined usinglinear weighted sum method. Lastly, the HMM/GMM based speaker segmentation and clusteringis performed with the combined Δ BICscore. Experiments show that the IB features providemore prior information for the system and effectively reduce the speaker match error rate.In the process of speaker segmentation and clustering, there are sometimes high similaritiesbetween different speaker models, which makes them easy to be confused. In order to resolve theproblem, a speaker segmentation and cluster method based on Maximum MutualInformation(MMI) is proposed in this thesis. This method uses MMI to train speaker models onthe basis of Maximal Likelihood Estimation(MLE). During this process, the method of selectingthe competitive set is improved by only choosing the speech that is difficult to be classified, andthe amount of computation is reduced. At the same time, the stop criterion in clustering ismodified in order to be adapted to MMI and applied to speaker segmentation and clusteringbetter. Experiments show that the proposed method can increase the distinction between differentspeaker models and effectively reduce the speaker match error rate.
Keywords/Search Tags:speaker segmentation and clustering, HMM/GMM model, speech/nonspeechdetection, BIC, information bottleneck principle, discriminative training, MMI
PDF Full Text Request
Related items