Font Size: a A A

The Study Of Feature Extraction Method For Speech Recognition Based On The Hilbert-Huang Transform

Posted on:2013-10-14Degree:MasterType:Thesis
Country:ChinaCandidate:H Y WangFull Text:PDF
GTID:2248330374974872Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
Speech is a natural, basic and most important information carrier for communication.With the rapid development of science and technology, our capability of understanding thesignal is improving. In the information society, the speech signal processing and itsapplication have become a hot research topic. Speech recognition is an extremely importantbranch of speech signal processing, including features extraction, model training techniqueand pattern matching criteria. A large number of studies have shown that the featureextraction and selection directly affect the accuracy and speed of speech recognition.The commonly used feature extraction methods in speech recognition, e.g. LPCC,MFCC or Wavelet transform based feature extraction, are based on the time-frequencyanalysis techniques of signal. However, traditional time-frequency analysis techniques mustsatisfy the assumption of linear or stationarity of signal, this means that, these analysistechniques maybe defective when deal with the non-linear or non-stationary signal.Hilbert-Huang transform (HHT), an available method for non-linear and non-stationarysignal analysis and processing, proposed by Huang N. E. at1998. HHT can adaptivelydecompose the complex signal into some intrinsic mode functions (IMF), whose frequencyare from high to low distribute, by empirical mode decomposition and the definition of IMF.These IMF have physical meaningful instantaneous frequency, and their frequency oramplitude may be modulated. Hilbert transform of IMF form a three dimensionaltime-frequency spectra, which can fully reveal the time varying characteristics of signal. HHTdoes not require any priori knowledge, its decomposition basic functions adaptively dependon the signal itself, and the decomposition has real physical meanings. Meanwhile, unlikeFourier transform and Wavelet transform, HHT is no longer subject to the constraints of theHeisenberg uncertainty principle. A vast number of studies have revealed that HHT showssuperior performance at non-linear and non-stationary signal analysis and processing.Speech is a kind of typical non-linear and non-stationary signal. In this paper, we utilizeHHT instead of the traditional time-frequency analysis method to extract features for speechrecognition. By studying the physical quantities (such as IMF, HS, HMS and so on) which areresults of HHT to signal, we extract efficient features: HMS-MFCC and EWCF. HMS-MFCC is Mel frequency cepstral coefficients based on the Hilbert Marginal Spectra. EWCF is theinstantaneous energy weighted center frequency of IMF. At the same time, an improveddecomposition method Sliding-fastBSpline-EMD is proposed. This method makes use of twomeans: sliding window and fast computing of B spline base functions, and it combines theidea of BSpline-EMD and EEMD. In addition, the time-frequency analysis performance ofHHT, the decomposition capability of Sliding-fastBSpline-EMD and the characterizationcapability of extracted features are respectively verified by simulation experiments and speechrecognition experiments conducted on the test functions and speech database.
Keywords/Search Tags:HHT, EMD, IMF, Speech Recognition, Feature Extraction
PDF Full Text Request
Related items