Neural Network Based Voice Conversion

Posted on:2015-04-07

Degree:Master

Type:Thesis

Country:China

Candidate:F L Xie

Full Text:PDF

GTID:2298330422990920

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Neural network (NN) based voice conversion, which employs a nonlinear functionto map the features from a source to a target speaker, has been shown to outperformGMM-based voice conversion method. However, there are still limitations to be over-come in NN-based voice conversion, e.g. NN is trained on a Frame Error (FE) minimiza-tion criterion and the corresponding weights are adjusted to minimize the error squaresover the whole source-target, stereo training data set. In this paper, we use the idea of sen-tence optimization based, minimum generation error (MGE) training in HMM-based TTSsynthesis, and modify the FE minimization to Sequence Error (SE) minimization in NNtraining for voice conversion. The conversion error over a training sentence from a sourcespeaker to a target speaker is minimized via a gradient descent-based, back propagation(BP) procedure. Experimental results show that the speech converted by the NN, whichis first trained with frame error minimization and then refined with sequence error mini-mization, sounds subjectively better than the converted speech by NN trained with frameerror minimization only. Scores on both naturalness and similarity to the target speakerare improved. In voice conversion task, prosody conversion especially pitch conversionis also a very challenging research topic because of the discontinuity property of pitch.Conventionally pitch conversion is always achieved by adjusting the mean and varianceof the source pitch distribution to the target pitch distribution. This method removes mostof the detailed information of the speaker prosody and only maintains the F0contour. Inthis paper, we propose a neural network based pitch conversion system which converts F0and spectral features all together frame by frame. Experimental results show that neuralnetwork based pitch conversion can significantly reduces the Unvoiced/Voiced error andRMSE of F0between converted pitch and target pitch compared with the convention-al Gaussian normalized transformation method. And wavelet decomposition for F0canfurther improve the conversion performance.

Keywords/Search Tags:

voice conversion, neural network, pre-training, sequence error minimization, pitch conversion, wavelet decomposition

PDF Full Text Request

Related items

1	Voice Conversion Based On ANN
2	Studies On Key Techniques For Voice Conversion
3	Research On The Voice Conversion System
4	The Research On Vocal Tract Spectrum And Pitch Frequency Transformation In Voice Conversion
5	The Research Of Voice Conversion Based On The Spectral Parameters Of Vocal Tract
6	Study On The Neural Network Modelling Method For Voice Conversion
7	Research On Modelling And Conversion Of Segmental Feature
8	Voice Conversion Based On GMM And Codebook Mapping
9	Research And Implementation Of Voice Conversion Techniques Based On ARM9
10	Voice Conversion Using Spectrum With Super-Segment Prosody Features