Font Size: a A A

Whsipered Speaker Identification Research Based On Instantaneous Frequency Estimation

Posted on:2011-11-22Degree:MasterType:Thesis
Country:ChinaCandidate:M WangFull Text:PDF
GTID:2178360305976274Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Whispered speech, a special phonation mode different from normal speech in phonetics and physiology, has existed in human daily life for long time. With the ever-increased economic and technology progress in the society, whispered speech has been become a more important rule and applied widely in many circumstance such as finance service, public security and identity identification.Under the practical use in speaker identification, whispered speech could be considered as a supplement to the normal speech to improve the performance of speaker identification system. Because whispered speech is vulnerable to the interference from communication channel and low recognition accuracy due to itself character, traditional speech parameter has worse robust performance in whispered speech application. It is necessary to study and develop an effective character representation of whispered speech in speaker identification. In an addition to this problem, as a speaker system trained mainly by normal speech, the performance of system declines sharply as tested with whispered speech. Therefore, how to improve speaker identification accuracy under the condition of sparse whispered speech data is a valuable problem. The contribution of this paper to whispered speech speaker identification are as follow.1. Based on the non-linear phenomenon in speech and formant demodulation theory of speech production, this paper introduce AM-FM model of speech production particularly. A energy operator called Teager energy operator and discrete energy separation algorithm (DESA) are introduced in speech application. Meanwhile, a comparison between the energy separation algorithm and other algorithm which has similar function is presented.2. According to multiband demodulation analysis (MDA) in mixed components signal detection, the instantaneous amplitude and frequency of speech signal are extracted by DESA. A kind of speech parameter called instantaneous frequency estimation (IFE) are extracted by the weighted estimation both on amplitude and frequency to represent the accurate frequency structure of speech. The proposed speech parameters have been applied to whispered speaker identification and compared with conventional MFCC. The experiment results show that, as the test objectives increase, the IFE parameters perform as well as MFCC, even a little better. When the test channels are changed, comparing with MFCC, IFE effectively improves the robust performance of system.3. The performance of speaker identification system, trained mainly with neutral voices, declines sharply when tested with whispered speech. In order to change this phenomenon, on the condition that whispered speech and normal speech come from different channels, feature mapping is used to reduce the effects of channels before training and testing speaker system based on the universal background model (UBM). The experiment results show that, feature mapping improves the accuracy of system, and compared with MFCC, IFE provides better robustness and accuracy results than MFCC.
Keywords/Search Tags:AM-FM Model, multiband demodulation analysis, energy separation algorithm, instantaneous frequency, feature mappping
PDF Full Text Request
Related items