Font Size: a A A

Research On Speaker Recognition In Conversational Speech

Posted on:2008-11-07Degree:MasterType:Thesis
Country:ChinaCandidate:D P LiuFull Text:PDF
GTID:2178360215490253Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Speaker Recognition (SR), also called Voiceprint Recognition, is a kind of technology which is used to identify the speaker by his (her) voice. The SR technology can be widely used in speaker identification card, security, telephone shopping etc. Conversational speech is the speech that contains more than one person, such as the conference record, the telephone dialog and the broadcast news. Speaker recognition in conversational speech is to decide who is talking when. It is a difficulty in speech recognition, in which segmentation and clustering technique were used. It can be used in information indexing, speaker tracking, content extraction etc.In this dissertation, Firstly the development and application of the speaker recognition was introduced. And then the feature extraction was discussed, which includes endpoint detection, spectral analysis and phoneme duration analysis; Then the pattern match technique which contains Gaussian Mixture Model (GMM), Hidden Mark Model (HMM), Vector Quantization (VQ) and Artificial Neural Network (ANN) was discussed; Finally the MAP adaption was used. The main work is as follows:①The phoneme duration model was build to testify the usefulness of the phoneme duration for the speaker recognition. And two methods were proposed to solve the less data problem, when using a small amount of training speech data.②A method that divides the speech into a variable length in one and a half seconds was proposed. Every test segment is merged by the syllables which were detected by the endpoint detection. Because of keeping the integrity of the syllable and the suitable length of test data, it improved the speaker identification rate.③Based on the phenomenon that most speaker turns take place in the speech break, a method of identifying the head of the semantic segment, and calculating the comparability of other segments was proposed. This method can reduce the running time. And it is an effective method to run the system under some poor environments with losing small recognition rate.④The MAP method was used to adapt the GMM model in order to improve the robust of the system. And the probabilistic adaption of the recognition score was adopted, which not only shows the recognized speaker, but also gives the possibility of the recognized result. This fuzzy result shows more precise information. And the recognition rate can be improved once more, when the confidence limit was used.
Keywords/Search Tags:Speaker Recognition, Conversational Speech, Endpoint Detection, Speaker Clustering
PDF Full Text Request
Related items