Font Size: a A A

Speech synthesis algorithms for voice conversion

Posted on:1997-12-02Degree:Ph.DType:Thesis
University:University of FloridaCandidate:Hsiao, Yung-ShengFull Text:PDF
GTID:2468390014482354Subject:Engineering
Abstract/Summary:
The first goal of this research was to create a software-based voice conversion system to independently and automatically modify the characteristics of human voice. The system was intended to generate high quality test tokens for speech science and psychoacoustic studies. The second goal was to develop algorithms to convert voice from one speaker to that of another speaker. The results of this study will be of interest to researchers in speech analysis, speech synthesis and speaker identification.; The key ideas for our voice conversion system are based on the source-tract production model, which is a highly parametric representation for speech analysis and synthesis. The software system consists of three subsystems, a speech analyzer, a parameters modifier and a speech synthesizer, which extracts, modifies and synthesizes five types of acoustic features, respectively. The features are the formant frequency and bandwidth, the shape of the glottal pulse, the voicetype classification, the pitch contour and the gain contour. The first two types of parameters are frame-based, and they represent the speaker's characteristics of the vocal tract and the glottal folds, respectively. The final three parameters form the controlling parameters for our system. One major feature of our acoustic model is that the controlling parameters are independent of the other parameters so that they control the way of how the frame-based information concatenates, such as changing the speaking rate or increasing the voice volume. This makes it possible to mimic the characteristics of another speaker's voice, including the prosodic features.; The voice conversion algorithms are based on a speaker adaptation model that treats speaker differences as arising from a parametric transformation. The voice conversion task is then realized as the mapping between two set of parameters. Several experiments were conducted to test the performance of our voice conversion algorithms. The affine transformation method proved to be effective for converting single-syllable words, but less so for sentences. Perhaps this is because a sentence has more locally dynamic changes than the capacity of our linear mapping methods. One possible way to improve is to include a phoneme detector in our system and estimate the piecewise mapping functions instead of one linear function for the entire speech.
Keywords/Search Tags:Voice conversion, Speech, System, Algorithms, Synthesis
Related items