Font Size: a A A

Extraction Of Speaker Individual Information By Suppressing Phoneme Effects Based On Frequency Characteristics

Posted on:2015-03-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:C J XuanFull Text:PDF
GTID:1228330452460047Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Speech contains linguistic information and speaker information; the former indicates generalcharacteristics and the latter indicates individual characteristics of speakers. Speaker identificationneeds to preserve individual information and attenuate linguistic information at the same time.However, speakers’ individual information and linguistic information are difficult to be separatedfrom each other in an utterance. In order to solve this problem, the phoneme effect suppression(PES) method is proposed in this study to reduce the influence of inter-phoneme difference on thespeaker recognition, which was modified from the traditional F-ratio method to further emphasizethe speaker individual difference.This study, firstly, investigated the individual characteristics of specific physiological vocalorgans based on phoneme F-ratio contribution (PFC) of each frequency sub-bands. Threelanguages of English, Chinese and Korean were used to investigate acoustic expression of thespeaker individual information in each language. By examining phoneme-specific contribution tospeaker individuality, it is found that voiced phonemes and voiceless ones have differentcontributions to speaker information at certain frequency regions. These results show that, thespeaker information carried by each phoneme is different, which able to provide possibility ofresearch speaker feature with specific physiological organs using statistical method.Secondly, this study proposed the phoneme effect suppressed speaker informationdistribution (PES-SID) in frequency domain, considering reduce the influence of inter-phonemedifference on the speaker recognition that takes into account of the articulation-dependent factor ofspeakers and reduces the intra-speaker variance caused by different phonemes.Finally,our study proposed a new method for speaker-specific feature extraction focusing onthe representation of non-uniform frequency scale based on PES-SID. The proposed feature wasimplemented in GMM speaker models and used to speaker identification experiments. It wasconfirmed that the proposed feature outperformed the baseline features of Mel FrequencyCepstrum Coefficient (MFCC) and the traditional F-ratio. Compared with use of the MFCCfeatures, the recognition errors were reduced about61.1%for English,32.9%for Chinese, and68.0%for Korean.
Keywords/Search Tags:Speaker identification, Frequency warping, PES, Speech production
PDF Full Text Request
Related items