Font Size: a A A

Non-parallel Corpora Voice Conversion Based On Structured Gaussian Mixture Model Under Constraint Conditions

Posted on:2016-05-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y X JuFull Text:PDF
GTID:2308330464952922Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Voice conversion is a technique that transforms the characteristics of the voice uttered by A to the voice sounds like uttered by B while keeping the voice content unchanged. Most of the conventional voice conversion methods based the training on parallel corpus, which aligns feature of source speaker and that of target speaker to derive the conversion function. In fact, large amounts of parallel corpora are very difficult to get and sometimes it is even impossible. In addition, joint training has large computation. These factors limit the development of parallel corpora voice conversion. This paper proposes a non-parallel corpora voice conversion method using constraint-based Structured Gaussian Mixture Model. First, extract constraint conditions from small amounts of the same syllables in plenty of source and target non-parallel corpus. The constraint conditions which illustrate semantic information and correspondence between acoustic features of source and target corpus then are applied to restrict clustering centers of K-means clustering algorithm and modify posterior probability of a speech frame belonging to a Gaussian distribution in the whole training process of Structured Gaussian Mixture Model(SGMM). Then a Structured Gaussian Mixture Model based on constraint conditions(C-SGMM) which contains correspondence between the same phonetic components is obtained. After that, the same voice components in source and target voices are matched by aligning all Gaussian distributions in source and target C-SGMMs using fast model alignment algorithm according to Acoustic Universal Structure principle. Finally, short-time spectrum conversion function is derived through the aligning process, so testing sentences can be converted to the target sentences by employing the conversion function. By evaluating the converted speeches subjectively and objectively, it shows that the converted speeches obtained by the method proposed in this paper outdo the results obtained by traditional structured method from target tendency, voice quality and cepstrum distortion aspects. A closer performance to traditional parallel corpora Gaussian Mixture Model(GMM) based method is reached.This research focus on the following works:1、Achievement of the traditional GMM based parallel corpora voice conversion system and analysis of the existing problems in the method.2、Study of Acoustic Universal Structure and its principle and detailed analysis of existing problems in the conventional structural conversion method, then propose a non-parallel corpora voice conversion method using constraint-based Structured Gaussian Mixture Model.3、Achievement of structural non-parallel corpora voice conversion system based on small amounts of constraint conditions.4、Objective and subjective evaluation of the speeches obtained by GMM method, SGMM method and CSHMM method. Then analyze the evaluation results and prove the usefulness and superiority of the work in this research.
Keywords/Search Tags:Voice conversion, SGMM, Non-parallel corpora, Constraint conditions, Fast alignment
PDF Full Text Request
Related items