Font Size: a A A

Analysis Of Dynamic Characteristics Of Chinese: An Approach Based On Hilbert-Huang Transform

Posted on:2011-06-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:H HongFull Text:PDF
GTID:1118330332974366Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the development of modern science and technology, digital speech signal processing as a multidisciplinary and comprehensive subject, has become a really hot research area in modern signal processing. It is now generally recognized that hu-man speech is actually non-stationary, and perhaps, nonlinear. These characters greatly limit the further advance of the traditional speech processing for breeding new speech technology. Fortunately, the recently developed theories for processing nonlinear and non-stationary signals, such as the empirical mode decomposition and Hilbert-Huang transform, may open a new avenue for the development of speech processing.Chinese is one of the major languages in the world, with a population speaking the language amounting to about 1.4 billion. Besides the mainland China, the Hong Kong Special Administrative Region, the Macao Special Administrative Region and Taiwan, Chinese are also distributed in south eastern nations like Singapore, Malaysia, etc. It is also widely used in many developed western countries. Unlike most languages in the world, Chinese is tonal language. In official Chinese, there are four distinct tones. Among the four speech tones the first, the second and the fourth tones vary monotonically, but the third tone varies non-monotonically, typically in a "V" shape contour. The non-monotonicity reflects the temporal variation of such properties as the fundamental frequency and formants. However, the time variation cannot be captured accurately by the traditional techniques of signal processing. The newly developed time-frequency analysis method, the Hilbert-Huang trans-form (HHT) which is based on the Empirical Mode Decomposition (EMD), was taken as a scientific breakthrough to Fourier analysis in that it can process signals that are non-stationary and non-linear in an adaptive way. By adopting the EMD-based method, we are able to disintegrate speech into a series of oscillation modes, each describing speech dynamics on different characteristic time scale. Some of these modes may correspond to or contain subcomponents of speech signal, such as pitch and formant. Then, it be-comes easier to apply various signal processing technologies to extract effective char-acteristics from these components. In this way, we can break through the limitation imposed by traditional linear speech processing techniques and it becomes possible to obtain the dynamic characteristics of Chinese speech signal.The main contributions of the present work can be described as follows:1. Aiming at solving the problem of mode mixing in EMD, two novel sifting methods based on the concept of local integral mean and centroid of a signal are de-veloped for EMD. By comparing the performances of the original EMD and the two proposed approaches, we testify the stability and noise immunity of the two proposed approaches.2. An approach is proposed specially for capturing fine dynamic structures of speech fundamental frequency that may vary in "V" shaped way as those of the third tones in Chinese speech. As each of the modes decomposed by the EMD processes a unique and physically meaningful instantaneous frequency at any time, we are able to calculate the instantaneous frequency for each mode by the Hilbert transform and observe the distribution of them, with which we can extract effective characteristics. Our approach first estimates the rough trend of variation of a fundamental frequency contour by means of the cepstrum technique, and then, utilizes the trend as a refer-ence to track the variation and calculates the detailed contour from a few of intrinsic mode functions that are decomposed by the ensemble empirical mode decomposition. Intensive evaluation and direct comparisons with existing methods are conducted with the standard Chinese Mandarin database, showing the effectiveness of the proposed method in acquiring accurate and reliable fundamental frequency form speech signals even heavily contaminated with noise. 3. To avoid the problems of illusive peaks and formant mergence that are usu-ally encountered by the traditional detectors, we propose a new approach for detecting formants. In the method, we apply the empirical mode decomposition method to have formants separated into different intrinsic mode functions, and then, estimate each for-mant frequency from these functions by LPC. The results show the superiority of this proposed approach.Our results of the present work in detecting the dynamic fundamental frequency and formants not only are helpful in better understanding the underlying speech dynam-ics and the rhythm in Han Chinese, but also provide a alternative basis for developing new techniques in speech processing.
Keywords/Search Tags:Hilbert-Huang Transform(HHT), Empirical Mode Decomposi-tion(EMD), Intrinsic Mode Function, Mode Mixing, Chinese, Fundamental Frequency, Formant
PDF Full Text Request
Related items