Research On Technology Of Voice Conversion

Posted on:2017-02-06

Degree:Master

Type:Thesis

Country:China

Candidate:B Lu

Full Text:PDF

GTID:2308330485486146

Subject:Signal and Information Processing

Abstract/Summary:

PDF Full Text Request

Voice conversion(VC) is a technique that manipulates one speakerâ€™s(source) voice timbre and/or prosody by algorithm automatically to make it sound like another speaker(target) said and keep the language content unchanged at the same time. After reaseach on the fundamentals about VC, this thesis proposes a new voice conversion technique based on sparse representation nonnegative matrix factorization(SNMF).Then comparing this technique with a state-of-the-art baseline VC method which is based on maximum likelihood Gaussian mixture model(ML-GMM) on the parallel corpus called CMU ARCTIC, it proves that the proposed method equals the ML-GMM one in subjective listening test. Whatâ€™s more, under the limit training data situation, the proposed SNMF VC techniqueâ€™s speaker identification rate is more than 72%,while the ML-GMMâ€™s is less than 28%. At the same time, in subjective mean opinion score(MOS) test,the SNMF performs 2.6 better than ML-GMM dose 1.8. Through the comparison, it turns out that SNMF has a better subjective listening performace and robustness.This thesis proposes two improvements about SNMF VC to make its performace better and reduce the spectrum distortion further. Firstly, considering the complexity and variance of speech signal, this thesis introduces kmeans clustering algorithm into the SNMF VC system to enhance the ability of NMF to dig the latent features of speech signal.This method is called kmeansSNMF, which dose kmeans clustering to all the training data to make it cluster into k different clusters first and then do SNMF VC in each cluster respectively. The experiment result indicates that this improved method reduces the spectrum distortion of SNMF vastly and makes SNMF VC technique more effective to use the large amount of training data.Secondly, in terms of the importance of the inter-frame information, this thesis brings in the combined frame to make three or more frames together into a large frame, which introduce the inter-farme information into kmeansSNMF. And it turns out that the new method makes the spectrum distortion lower, improves the naturality for auditory sense and has a better subjective listening performace than the classical ML-GMM method, which the MOS of former is 3.78 while the latterâ€™s is 3.70.At last, enlightened by the SNMF voice conversion technique, this thesis applys a method called joint nonnegative matrix factorization to make a factorization for two or more training data matrixes simultaneously with only one fixed activation matrix. Then based on this mothed, this thesis proposes a cross-voice conversion system to make one-to-many and many-to-one voice conversion rather than the conventional one-to-one(source-to-target) voice conversion.

Keywords/Search Tags:

voice conversion, GMM, NMF, kmeans, cross-voice conversion

PDF Full Text Request

Related items

1	Research On Methods For Voice Covnersion
2	Age-Voice Conversion System Driven By Multi-Parameter
3	Research On Any-to-many Voice Conversion Based On Non-parallel Data
4	Research On HMM-based Voice Conversion
5	Emotional Voice Analysis And Conversion Based On Parallel Corpus
6	The Research And Implementation Of Voice Conversion Technology
7	Studies On Key Techniques For Voice Conversion
8	Research On The Voice Conversion System
9	The Research On Feature Parameters And Transformation Methods In Voice Conversion
10	Cross-lingual Voice Conversion Based On Mutual Information And SE Attention Mechanism