Voice Conversion Based On Unified Dictionary With Clustered Features Using NMF

Posted on:2019-05-03

Degree:Master

Type:Thesis

Country:China

Candidate:H Jin

Full Text:PDF

GTID:2428330545471738

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

Voice conversion(VC)is a technique for converting specific information in speech of source speaker to be with the target speaker's personality characteristics while maintaining linguistic information in the utterance.Voice conversion covers signal processing,acoustics and other disciplines and the research of speech signal conversion can facilitate innovations such as speech coding,speaker recognition and so on.Voice conversion is also widely used in speech synthesis systems,multimedia entertainment,language translation systems,speech enhancement systems in the medical field and speaker disguise identity communication.The majority of existing algorithms are based on statistical models,of which the Gaussian Mixture Model(GMIM)is the mainstream.Many of them require parallel corpora,which takes many limitations and problems.For example,the training data must be the same for both speakers,the trained model can only be applied to a specific pair of combinations,the corpora of both speakers are inadequate,the frames mismatching may occur when aligning and so on.Based on the conventional non-negative matrix factorization(NMF)voice conversion,this paper proposes a novel algorithm through the unified dictionary with clustered spectral features for many-to-many voice conversion using non-negative matrix factorization.The algorithm decomposes the spectral features of speaker's speech into personality features and semantic related parts.Firstly,the fundamental frequency and short-time spectral parameters are extracted by STRAIGHT,and the linear prediction cepstum coefficients(LPCC)are extracted for the short-term spectrum.Then moderate parallel corpora from N speakers are aligned using dynamic time warping and their own dictionary is constructed through high-dimensional mean clustering.The non-negative matrix spectral parameters are approximated by the product of the linear combination of the respective dictionary and the activation matrix.The converted speech spectrum parameters are the combination of the target speech's unified dictionary and the source speech's activation matrix.The algorithm can realize many-to-many voice conversion on the condition that the source and target speakers have insufficient corpora.Experimental results show that the cepstrum distortion and speech quality of the converted speech obtained by the above method are better than the traditional non-negative matrix factorization using VCC2016 corpora.The average cepstrum distortion is about 4.3%lower than the performance of conventional NMF-based algorithm.

Keywords/Search Tags:

Voice conversion, Clustered features, Non-negative matrix factorization, Unified dictionary

PDF Full Text Request

Related items

1	Non-negative Matrix Factorization Algorithm And Its Application In Voice Conversion
2	Research On Non-negative Matrix Factorization Algorithm
3	An Algorithm For Voice Conversion With Noise Robustness
4	Research On Face Recognition Based On Non-negative Matrix Factorization
5	Eeg Feature Extraction Based On Non-negative Matrix Factorization
6	Study On Deep Non-negative Matrix Factorization Algorithm
7	Research On Face Recognition Algorithms Based On Multi-layer Non-negative Matrix Factorization Architecture
8	Study On The Characteristic Of Pattern Expression Non-Negative Matrix Factorization
9	Information Retrieving Based On Non-Negative Matrix Factorization
10	Study On Improved Discriminant Non-negative Matrix Factorization