Non-parallel Corpora Voice Conversion Based On Structured Gaussian Mixture Model Under Constraint Conditions

Posted on:2016-05-09

Degree:Master

Type:Thesis

Country:China

Candidate:Y X Ju

Full Text:PDF

GTID:2308330464952922

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

Voice conversion is a technique that transforms the characteristics of the voice uttered by A to the voice sounds like uttered by B while keeping the voice content unchanged. Most of the conventional voice conversion methods based the training on parallel corpus, which aligns feature of source speaker and that of target speaker to derive the conversion function. In fact, large amounts of parallel corpora are very difficult to get and sometimes it is even impossible. In addition, joint training has large computation. These factors limit the development of parallel corpora voice conversion. This paper proposes a non-parallel corpora voice conversion method using constraint-based Structured Gaussian Mixture Model. First, extract constraint conditions from small amounts of the same syllables in plenty of source and target non-parallel corpus. The constraint conditions which illustrate semantic information and correspondence between acoustic features of source and target corpus then are applied to restrict clustering centers of K-means clustering algorithm and modify posterior probability of a speech frame belonging to a Gaussian distribution in the whole training process of Structured Gaussian Mixture Model(SGMM). Then a Structured Gaussian Mixture Model based on constraint conditions(C-SGMM) which contains correspondence between the same phonetic components is obtained. After that, the same voice components in source and target voices are matched by aligning all Gaussian distributions in source and target C-SGMMs using fast model alignment algorithm according to Acoustic Universal Structure principle. Finally, short-time spectrum conversion function is derived through the aligning process, so testing sentences can be converted to the target sentences by employing the conversion function. By evaluating the converted speeches subjectively and objectively, it shows that the converted speeches obtained by the method proposed in this paper outdo the results obtained by traditional structured method from target tendency, voice quality and cepstrum distortion aspects. A closer performance to traditional parallel corpora Gaussian Mixture Model(GMM) based method is reached.This research focus on the following works:1ã€Achievement of the traditional GMM based parallel corpora voice conversion system and analysis of the existing problems in the method.2ã€Study of Acoustic Universal Structure and its principle and detailed analysis of existing problems in the conventional structural conversion method, then propose a non-parallel corpora voice conversion method using constraint-based Structured Gaussian Mixture Model.3ã€Achievement of structural non-parallel corpora voice conversion system based on small amounts of constraint conditions.4ã€Objective and subjective evaluation of the speeches obtained by GMM method, SGMM method and CSHMM method. Then analyze the evaluation results and prove the usefulness and superiority of the work in this research.

Keywords/Search Tags:

Voice conversion, SGMM, Non-parallel corpora, Constraint conditions, Fast alignment

PDF Full Text Request

Related items

1	Voice Conversion Using Structured Gaussian Mixture Model In Eigen Space
2	Research On Many To Many Voice Conversion Based On I-vector And Improved Variational Autoencoder For Non-parallel Corpora
3	Research On Many-to-Many Voice Conversion Based On I-vector,Variational Auto-encoder And Generative Adversarial Networks For Non-parallel Corpora
4	Non-parallel Many-to-Many Voice Conversion Based On SE-ResNet Combining Speaker Embedding
5	The Research On Voice Conversion Algorithm Based On Improved Bilinear Frequency Warping For Parallel Or Nonparallel Corpora
6	Voice Conversion Based On Isolated Speaker Model
7	Non-parallel Voice Conversion Using ACGAN And Variational Autoencoders Conditioned By Sentence Embedding
8	Research On Many-to-Many Voice Conversion Based On Multi-Scale StarGAN By Share-Learning For Non-parallel Corpora
9	Emotional Voice Analysis And Conversion Based On Parallel Corpus
10	The Experimental Study And Realization Of Mongolian-Chinese Alignment Corpora