Font Size: a A A

Key Algorithm In High Quality Voice Conversion System

Posted on:2013-02-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhouFull Text:PDF
GTID:2218330371957701Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Voice conversion (VC) is a technique used in order to turn the personality characteristics of a speaker's (the source speaker) voice into another person's (the target speaker). Speech contains a lot of information, in which the most important is the semantic information, and another is the individuality information. The target of a VC system is to change or modify speaker's individuality while preserve the original semantic information, so that speech uttered by one speaker is transformed to sound as if it had been articulated by another speaker. This paper studies the key technology of the high quality VC system. The main work and contributions are described as follows:1. The VC system aims to transform voices. Moreover, the synthetic speech in the high quality VC system should be more natural and understandable. Studies of the model and parameters for speech signal analysis is done proceed from the model of pronunciation. This paper mainly researches the conversion methods especially the algorithm based on GMM models. The system is simulated, and evaluated by means of both objective and subjective tests.2. The traditional VC system often has unnatural conversion voice. Hence, in this dissertation, this paper improves it through change the time-scale of speech, which is operated with insert the converted parameters before and after each word. The results of the listening tests in which the naturalness and understandability of the converted voice are reported better than ever.3. In the VC system based on the improved algorithm proposed before, MFCC is adopted to be extracted as it is more beneficial for sound perception. The 3-D MFCC diagrams as well as waveforms of the voices before and after the conversion are given. The test results confirm that the transformed speech not only approximates the characteristics of the target speaker, but also more nature and understandable.
Keywords/Search Tags:Voice Conversion, time-scale, Gaussian Mixture Model, Mel-Frequency Cepstrum Coefficient
PDF Full Text Request
Related items