Font Size: a A A

Fractional Fourier Transform And Its Application In Whispered Speaker Identification

Posted on:2013-05-07Degree:MasterType:Thesis
Country:ChinaCandidate:X H QianFull Text:PDF
GTID:2248330371493479Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Whispered speech, as a complement and substitute to the normal speech, is one of the widely used communication ways in daily life. With the development in social and economic, whispered speech has played a more and more important role in the fields of mobile communication, finance service, public security and so on. Under the practical application in whispered speaker identification, most of the parameters are based on the revision of the feature parameters of normal speech, which are poor in the robustness and vulnerable to the interference of the channels. Hence, it’s an urgent problem to research an effective parameters of whispered speech for speaker identification system. In addition, taking into account that it is very difficult to collect whispered data in the actual situation, and then how to improve the performance of the whispered speaker identification system is worth considering in the case of not having enough training data. The contribution of this paper to whispered speaker identification are as follows.1. Taking into account that the speech signal is time-varying and non-stationary, therefore, fractional Fourier transform (FRFT) is introduced to act the analytical tools of the speech in this paper, which is not only suitable to handle the non-stationary signal, but also has another parameter (order). As a result, achieving good results in speech signal processing.2. According to the non-acoustic phenomenon in the process of speech production, the AM-FM model is introduced to describe the speech production from the formant modulation angle. Then the Teager operator, energy separation algorithm and multiband demodulation analysis theory based on this model are detailed description in this paper.3. How to determine the optimal order of fractional Fourier transform to extract features of whispered speech in order to achieve the best result is a key issue. Then a kind of feature based on piecewise linear fitting of instantaneous frequency, namely, adaptive fractional Fourier transform cepstral coefficients (A-FRCC) is presented in this paper. Applying the new parameters for whispered speaker identification based on GMM, experimental results show that the new features can observe more sophisticated structure of speech and more personalized of speakers, at the same time, effectively improve the recognition rate and robustness, comparing with the step search fractional Fourier transform cepstral coefficients (S-FRCC) and instantaneous frequency estimation (IFE).4. Based on the situation that training data is not efficient, an universal background model (UBM), which is a speaker-independent and channel-independent model, is introduced to train speaker models called GMM-UBM. By the experimental comparison, this model can also improve the recognition rate in the case of less training data and the new features A-FRCC perform best.
Keywords/Search Tags:fractional Fourier transform, AM-FM Model, energy separation, instantaneousfrequency, adaptive
PDF Full Text Request
Related items