Font Size: a A A

Inverse solution of speech production based on perturbation theory and its application to articulatory speech synthesis

Posted on:1999-10-06Degree:Ph.DType:Thesis
University:Chinese University of Hong Kong (People's Republic of China)Candidate:Yu, ZhenliFull Text:PDF
GTID:2468390014971123Subject:Engineering
Abstract/Summary:
The inverse solution of speech production for formant targets of vowels and vowel-to-vowel transitions is studied. Band-limited Fourier cosine expansion of vocal-tract area function or its logarithm is used to model the vocal-tract shape. The inverse solution is based on the perturbation theory of speech production incorporate with a fast calculation of the vocal-tract system. An interpolation method for dynamic constraint on the unobservable zeros and vocal-tract length along the transition between the endpoint of vowel-to-vowel transition is proposed. A unique mapping acoustic-to-geometry codebook is used to match the zeros and vocal tract length of the endpoint. The codebook is designed by geometrical and acoustical constraints. Computer simulation of the evaluation of the inverse solution shows reasonable results with respect to the naturalness of transition behavior of the vocal-tract area function. An articulatory synthesizer with a reflection-type line analog model which is driven by vocal-tract area is implemented. Synthesis evaluation of the performance of the inverse solution for vowel-to-vowel transitions as well as for isolated vowels is conducted. The resultant spectrogram vision and perceptual listening of the synthetic sounds is satisfactory. Quantitative comparison in forms of formant traces reveals fairly good matching of the formants of synthetic sounds to the original one. A novel formant targeted articulatory synthesis, as an application of the inverse solution, is proposed. The entire system consists of an inverse module and a reflection-type line analog model. The synthesizer needs only the first three formant trajectories, pitch contour and amplitude as input parameters. A formant mimic synthesis in which the input parameters can be artificially specified and a formant copy synthesis in which the input parameters are obtained by estimation from real speech sound are implemented. The formant trace or pitch contour can be separately modified to obtain colorful timbre of speech sounds, in other words, to realize voice conversion. Quantitative analysis between the formant trace of synthetic sounds and target is made and the result is promising. The quality of the synthetic sounds is judged reasonably good based on informal perceptual listening tests. Spectrogram also visually confirms the performance.
Keywords/Search Tags:Inverse solution, Speech production, Synthetic sounds, Formant, Synthesis, Articulatory
Related items