Font Size: a A A

Speaker Recognition Technology Research

Posted on:2013-04-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y HaoFull Text:PDF
GTID:2248330374485366Subject:Access to information and detection technology
Abstract/Summary:PDF Full Text Request
Multi-speaker recognition is a technology which can indentify the speakers in aheap of voice material consist of multiple speakers automatically. Comparing with thetraditional speaker recognition technology, multi-speaker recognition not onlyrecognizes the person who is speaking, but also finds out when he is speaking. It is anextending of speaker recognition technology. In this thesis multi-speaker recognitiontechnology was researched, including the following aspects:First, the traditional speaker segmentation algorithm based on the BayesianInformation Criterion (BIC) estimates the distributing of voice signal inaccurately,which is a major defect. Focusing on the defect, in this thesis an improved BIC distanceis proposed by clustering the feature vector into several classes, each represented by oneguassian distribution. A two-level speaker segmentation algorithm is then proposed bycombinating the improved BIC distance with generalized likelihood ratio (GLR) andseveral other algorithms. Experiments result shows that compared with the traditionalBIC algorithm, the two-level speaker segmentation algorithm based on the improvedBIC distance has a lower false alarm rate, and has a better overall performance.Second, a new speaker clustering algorithm is proposed by combinating theGaussian Mixture Model (GMM) and BIC. The algorithm takes advantage of GMMthat it can approach all kinds of distribution function and model the short voice segment(shorter than5seconds) better than single guassian distribution, and solved the problemthat speaker clustering algorithm based on BIC has a low correct rate when dealing withshort voice segment. Experiments result shows that compared with the BIC speakerclustering algorithm, the speaker clustering algorithm combinating GMM and BIC has ahigher correct rate when dealing with voice segments shorter than5seconds.Third, speaker recognition methods based on GMM and Support Vector Machine(SVM)are studied. Experiments are carried out to study the effect on recongition byfactor such as feature parameters e.g. the linear predictive cepstral coefficients (LPCC),Mel cepstral coefficients (MFCC), GMM order, kernel function, etc. Experiments resultshows that kernel function affects the recognition performance greatly, besides, using larger GMM order, longer train and test voice can improve recognition performance.Fourth, several multi-speaker recognition systems using different speakersegmentation, clustering, and recognition algorithms are build. Experiments are carriedout to compare the performance of the systems. Results show that the system usingimproved BIC segmentation algorithm and the clustering algorithm combinating GMMand BIC which are proposed in this thesis has best performance. Compared with thesystem using BIC segmentation and clustering algorithm, the relative frame correct rateincreasement is3.5%.
Keywords/Search Tags:multi-speaker recognition, speaker segmentation and clustering, Bayesian Information Criterion, Mel cepstral coefficients, Gaussian Mixture Model
PDF Full Text Request
Related items