Font Size: a A A

Voice Conversion Based On Unified Dictionary With Clustered Features Using NMF

Posted on:2019-05-03Degree:MasterType:Thesis
Country:ChinaCandidate:H JinFull Text:PDF
GTID:2428330545471738Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Voice conversion(VC)is a technique for converting specific information in speech of source speaker to be with the target speaker's personality characteristics while maintaining linguistic information in the utterance.Voice conversion covers signal processing,acoustics and other disciplines and the research of speech signal conversion can facilitate innovations such as speech coding,speaker recognition and so on.Voice conversion is also widely used in speech synthesis systems,multimedia entertainment,language translation systems,speech enhancement systems in the medical field and speaker disguise identity communication.The majority of existing algorithms are based on statistical models,of which the Gaussian Mixture Model(GMIM)is the mainstream.Many of them require parallel corpora,which takes many limitations and problems.For example,the training data must be the same for both speakers,the trained model can only be applied to a specific pair of combinations,the corpora of both speakers are inadequate,the frames mismatching may occur when aligning and so on.Based on the conventional non-negative matrix factorization(NMF)voice conversion,this paper proposes a novel algorithm through the unified dictionary with clustered spectral features for many-to-many voice conversion using non-negative matrix factorization.The algorithm decomposes the spectral features of speaker's speech into personality features and semantic related parts.Firstly,the fundamental frequency and short-time spectral parameters are extracted by STRAIGHT,and the linear prediction cepstum coefficients(LPCC)are extracted for the short-term spectrum.Then moderate parallel corpora from N speakers are aligned using dynamic time warping and their own dictionary is constructed through high-dimensional mean clustering.The non-negative matrix spectral parameters are approximated by the product of the linear combination of the respective dictionary and the activation matrix.The converted speech spectrum parameters are the combination of the target speech's unified dictionary and the source speech's activation matrix.The algorithm can realize many-to-many voice conversion on the condition that the source and target speakers have insufficient corpora.Experimental results show that the cepstrum distortion and speech quality of the converted speech obtained by the above method are better than the traditional non-negative matrix factorization using VCC2016 corpora.The average cepstrum distortion is about 4.3%lower than the performance of conventional NMF-based algorithm.
Keywords/Search Tags:Voice conversion, Clustered features, Non-negative matrix factorization, Unified dictionary
PDF Full Text Request
Related items