Speech synthesis algorithms for voice conversion

Posted on:1997-12-02

Degree:Ph.D

Type:Thesis

University:University of Florida

Candidate:Hsiao, Yung-Sheng

Full Text:PDF

GTID:2468390014482354

Subject:Engineering

Abstract/Summary:

The first goal of this research was to create a software-based voice conversion system to independently and automatically modify the characteristics of human voice. The system was intended to generate high quality test tokens for speech science and psychoacoustic studies. The second goal was to develop algorithms to convert voice from one speaker to that of another speaker. The results of this study will be of interest to researchers in speech analysis, speech synthesis and speaker identification.; The key ideas for our voice conversion system are based on the source-tract production model, which is a highly parametric representation for speech analysis and synthesis. The software system consists of three subsystems, a speech analyzer, a parameters modifier and a speech synthesizer, which extracts, modifies and synthesizes five types of acoustic features, respectively. The features are the formant frequency and bandwidth, the shape of the glottal pulse, the voicetype classification, the pitch contour and the gain contour. The first two types of parameters are frame-based, and they represent the speaker's characteristics of the vocal tract and the glottal folds, respectively. The final three parameters form the controlling parameters for our system. One major feature of our acoustic model is that the controlling parameters are independent of the other parameters so that they control the way of how the frame-based information concatenates, such as changing the speaking rate or increasing the voice volume. This makes it possible to mimic the characteristics of another speaker's voice, including the prosodic features.; The voice conversion algorithms are based on a speaker adaptation model that treats speaker differences as arising from a parametric transformation. The voice conversion task is then realized as the mapping between two set of parameters. Several experiments were conducted to test the performance of our voice conversion algorithms. The affine transformation method proved to be effective for converting single-syllable words, but less so for sentences. Perhaps this is because a sentence has more locally dynamic changes than the capacity of our linear mapping methods. One possible way to improve is to include a phoneme detector in our system and estimate the piecewise mapping functions instead of one linear function for the entire speech.

Keywords/Search Tags:

Voice conversion, Speech, System, Algorithms, Synthesis

Related items

1	Research Chinese Speech Based On Speech Recognition And Speech Synthesis Conversion
2	Voice Conversion Based On AHOcoder And GMM Model
3	Research On Embedded Speech Synthesis Technology
4	Research On Detection Algorithm Of Speech Spoofing And Its System Implementation
5	Research And Implementation Of Speech Synthesis Method For Helping Old Robots
6	The Research Of Personalized Speech Synthesis Based On Generative Adversarial Network
7	Research On Speech Recognition Using Voice Conversion Approach
8	The Research And Implementation Of Voice Conversion Technology
9	Design And Realization Of One-Shot Vehicular Voice-User Interface System
10	Research And Implementation Of Mongolian Emotional Speech Synthesis System Based On Deep Learning