Font Size: a A A

Model formation and classification techniques for conversations-based speaker discrimination

Posted on:2008-10-05Degree:Ph.DType:Dissertation
University:Temple UniversityCandidate:Ofoegbu, UchechukwuFull Text:PDF
GTID:1448390005962650Subject:Engineering
Abstract/Summary:
Speaker discrimination is the process of recognizing speakers from their voices. A general requirement for conventional speaker discrimination systems such as speaker identification and verification systems is that all participating speakers are known a priori and the system is trained with information from the voices of these speakers. Also, individual utterances are obtained from the participating speakers; therefore, a large amount of data is available for training the system, as well as for evaluating its performance.;A more recent application of speaker recognition involves differentiating between speakers in conversational data. This process is unsupervised (performed without a priori data about participants), with only short lengths of speaker utterances available. Current conversations-based speaker recognition systems operate by partitioning the speech data into segments of equal lengths and using features from each segment to represent the speakers. The problems with this method are as follows: (1) speaker change points are unknown. (2) Not all classes of speech are useful in characterizing speakers, and, since only short homogeneous speaker utterances are available, it is possible that an information rich segment be compared with an information deficit segment resulting in misclassifications. (3) Some portions of conversational data consist of overlapped speech from two different speakers, and cannot be effectively used to represent a single speaker. As a result of these deficiencies, state-of-the art conversations-based recognition systems report errors that range from 11%-40%.;This research addresses the above mentioned problems by (1) selectively creating data models using a combination of features from enhanced portions of short speaker utterances, (2) determining and implementing the best set of distance measures which will yield the minimum same-speaker and maximum different-speaker separation for conversations, and (3) developing a conversations-based speaker differentiation technique which takes into account the problems of short utterance lengths, co-channel speech, and lack of a priori information. A comprehensive conversations-based system, which can effectively differentiate between up to four different speakers in a conversation, was developed and tested on two different standard speech databases. An accuracy rate of over 90% has been obtained with the system.
Keywords/Search Tags:Speaker, System, Data, Speech
Related items