Extraction Of Speaker Individual Information By Suppressing Phoneme Effects Based On Frequency Characteristics

Posted on:2015-03-03

Degree:Doctor

Type:Dissertation

Country:China

Candidate:C J Xuan

Full Text:PDF

GTID:1228330452460047

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Speech contains linguistic information and speaker information; the former indicates generalcharacteristics and the latter indicates individual characteristics of speakers. Speaker identificationneeds to preserve individual information and attenuate linguistic information at the same time.However, speakers’ individual information and linguistic information are difficult to be separatedfrom each other in an utterance. In order to solve this problem, the phoneme effect suppression(PES) method is proposed in this study to reduce the influence of inter-phoneme difference on thespeaker recognition, which was modified from the traditional F-ratio method to further emphasizethe speaker individual difference.This study, firstly, investigated the individual characteristics of specific physiological vocalorgans based on phoneme F-ratio contribution (PFC) of each frequency sub-bands. Threelanguages of English, Chinese and Korean were used to investigate acoustic expression of thespeaker individual information in each language. By examining phoneme-specific contribution tospeaker individuality, it is found that voiced phonemes and voiceless ones have differentcontributions to speaker information at certain frequency regions. These results show that, thespeaker information carried by each phoneme is different, which able to provide possibility ofresearch speaker feature with specific physiological organs using statistical method.Secondly, this study proposed the phoneme effect suppressed speaker informationdistribution (PES-SID) in frequency domain, considering reduce the influence of inter-phonemedifference on the speaker recognition that takes into account of the articulation-dependent factor ofspeakers and reduces the intra-speaker variance caused by different phonemes.Finally，our study proposed a new method for speaker-specific feature extraction focusing onthe representation of non-uniform frequency scale based on PES-SID. The proposed feature wasimplemented in GMM speaker models and used to speaker identification experiments. It wasconfirmed that the proposed feature outperformed the baseline features of Mel FrequencyCepstrum Coefficient (MFCC) and the traditional F-ratio. Compared with use of the MFCCfeatures, the recognition errors were reduced about61.1%for English,32.9%for Chinese, and68.0%for Korean.

Keywords/Search Tags:

Speaker identification, Frequency warping, PES, Speech production

PDF Full Text Request

Related items

1	Frequency warping by linear transformation, and vocal tract inversion for speaker normalization in automatic speech recognition
2	Acoustic-feature-based frequency warping for speaker normalization
3	Research On Feature Extraction And Robust Technology For Speaker Identification
4	The Research Of Small Vocabulary Speaker-Independent Isolated Word Speech Recognition System
5	The Research Of Small Vocabulary Speaker-independent Isolated Word Speech Recognition System
6	Speaker Identification Of Whispered Speech Based On Joint Factor Analysis
7	A Research On Speaker Recognition Algorithm And Speaker Identification System Implementation
8	Research And Implementation Of Speaker Recognition Method For Anti-playback Fake Speech
9	Study On Speaker Verification Technology Related To Text And Applications
10	Neural dynamics of speech perception and production: From speaker normalization to apraxia of speech