Font Size: a A A

Independent Component Of The Chinese-based Phoneme Spectrum Analysis And Comparison Of Speech Synthesis Research

Posted on:2012-11-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:H WeiFull Text:PDF
GTID:1118330338956058Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Speech synthesis is one of the most important technology of the man-machine interface, and the ultimate aim is to make the computer to have the human speech ability. At present, one of the dominant problems is that, we can adjust the segmental and suprasegmental acoustic parameters flexibly, and simultaneously may ensure the higher naturalness of the synthesized speech.The independent component analysis is different from the traditional other approach, such as DFT, wavelet transform etc. It takes advantage of the independent component analysis approach to extract the independent components of the mandarin phoneme, and to analysis their acoustic characteristics. Based on the above analysis results, the exploratory research on the speech synthesis is developed.Based on the independent component analysis technique, the discriminating acoustic characteristics of the independent components of mandarin phoneme are discussed in the time and frequency domain, and the meanings of every independent component are identified by the combination of the acoustic mechanism of the mandarin phoneme. The traditional FFT spectral envelope, the LPC vocal track spectral envelope, and the higher order Wigner-Ville spectral envelope of the phoneme independent components are compared and analysized. The effects of the three spectral envelopes in the phoneme synthesis experiments are presented. The timbre of synthesized phoneme is controlled by adjusting the fundamental frequency curve, by windowing the spectral envelope of independent component on formants position, and by adjusting the mixing weight among the independent components.The main research of the paper is as follow:1. Based on the independent component analysis approach, every independent component of the phoneme in time domain is extracted. The correlation, the fundamental frequency FO curve, the acoustic position in F1-F2 and F2-F3 space and the formants of independent components are compared and analysized. Further, the distinguishable characteristics among the phoneme temporal independent components are found. With the help of the acoustic mechanism of the mandarin phoneme, considering the relation between the fundamental frequency and the vibrating frequency of vocal cords, the relation between the first formant F1 and the tongue position (high or low) in vowel utterance, and the relation between the second formant F2 and the tongue position (front or back) in vowel utterance, every temporal independent component of the phoneme is identified, and given a certain meaning, such as high fundamental frequency component, high tongue position component, front tongue position component.In the process of independent component analysis in frequency domain, every spectral independent component of the phoneme is extracted. The acoustic position in F1-F2 and F2-F3 space and the formants traits of the spectral independent components are compared and analysized. Further, the distinguishable characteristics among the phoneme spectral independent components in frequency domain are found. Every spectral independent component of the phoneme is distinguished and given a certain meaning, such as high tongue position spectral component, front tongue position spectral component.2. In the process of independent component analysis in time domain, with the same one phoneme independent component, the traditional FFT spectral envelope, LPC vocal track spectral envelope, and the higher order Wigner-Ville spectral envelope are extracted. The formants and harmonic structure hidden in the three spectral envelopes are compared, and the discriminating acoustic characteristic among the above spectra are found. The effects of the FFT spectrum, LPC spectrum and Wigner-Ville spectrum in the phoneme synthesis experiments are presented. In the experiments, applying the STRAIGHT algorithm, based on the fundamental frequency and three different spectral envelope of every phoneme independent component, the phoneme synthesis experiments with temporal independent component and with mixing temporal independent components are implemented. Based on the spectral independent components from the three different spectral envelope of every phoneme, the phoneme synthesis experiments with spectral independent components and with mixing spectral independent components are implemented.The experimental results show that, the three spectral envelopes have their obviously different acoustic characteristic. The phoneme LPC spectral envelope has revealed the gentle transferring traits of vocal tract, and the blunt formants structure. The Wigner-Ville spectrum has the more abundant harmonic components, more sharp formants, and higher frequency absolution. Some quick time-variant characteristics are displayed in the WV spectral envelope. From the effects of phoneme synthesis, the articulation and intelligibility of the synthesized phoneme with WV spectral envelope is better than with FFT spectra, and the LPC spectra comes third.3. The spectral envelope of every phoneme temporal independent component is windowed on the first and the second formants, to acquire the different timbre. The different independent components are combined by different weight to generate the timbre-controlled synthesized phoneme. The pitch and emotional traits of the synthesized phoneme are adjusted by fundamental frequency curve. In the paper, the rule 1, rule 2 and rule 3 of adjusting timbre are summarized to control the pitch and the formants in the spectral envelope of the synthesized phoneme.The experimental results show that, the timbre adjusted by windowing the spectral envelope is controlled within the satisfied range, not emerging the case in which the articulation and intelligibility of the synthesized phoneme decline sharply. The phoneme synthesized by weighting the independent components has revealed more expressive effects than only by every pure independent component. Based on the adjustment of independent components, the more exquisite timbre effect of the synthesized phoneme is acquired. The articulation and intelligibility of the synthesized phoneme is evaluated by mean opinion score. The mean score of the phoneme synthesis with temporal independent components is 4.5, the mean score of the phoneme synthesis with windowing temporal independent components spectrum is 4.53, and the mean score of the phoneme synthesis with mixing temporal independent components is 4.8. The mean score of the phoneme synthesis with spectral independent components is 4.45, and the mean score of the phoneme synthesis with mixing spectral independent components is 4.6.
Keywords/Search Tags:phoneme independent components, LPC spectral envelope, Wigner-Ville spectral envelope, formant windowing, phoneme synthesis
PDF Full Text Request
Related items