Voice Conversion Using Deep Belief Network In Super-frame Feature Space

Posted on:2017-04-11

Degree:Master

Type:Thesis

Country:China

Candidate:W Ye

Full Text:PDF

GTID:2308330488962079

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

Voice conversion(VC) is a technique that modifies the characteristics of the source speaker’s voice to make it sound like the target speaker’s voice while keeping the voice content unchanged. The spectral conversion based on Gaussian mixture model(GMM) is widely used as it can get better performance. However, the traditional GMM transformation deals with every feature vector independently of its previous and next frames, the converted speech will have certain discontinuity. In addition, the GMM-based VC system was also affected with over-smoothing, which is produced by the limited capability of the traditional GMM-based conversion functions to capture the source-target correspondence for a given parametric representation of speech.This paper presents a voice conversion technique using deep belief nets to map the spectral envelopes of a source speaker to that of a target speaker. Short-time spectral envelopes are represented by the linear predication cepstrum coefficients(LPCC) parameters which are derived by STRAIGHT, then using dynamic time warping to align the parameters between the source speaker and the target ones. The neighbor frames of the source speaker are gathered to form super-frames to serve as the input data of the network, and the corresponding frame of the target speaker is used as the output data of the network. The spectral conversion function is derived by training the neural network. ABX and MOS evaluations indicate that the conversion performance based on the presented method is better than the traditional GMM method under the parallel corpora condition.

Keywords/Search Tags:

Voice conversion, DBN, Short-time spectrum deep feature, Super-frames

PDF Full Text Request

Related items

1	Voice Conversion Using Spectrum With Super-Segment Prosody Features
2	Voice Conversion Based On Improved GMM And Short-Time Spectrum With Prosody
3	The Research On Vocal Tract Spectrum And Transition Methods In Voice Conversion
4	Research On Singing Voice Conversion
5	Voice Conversion Research Based On Spectral Envelope And Super-segmental Prosody
6	Studies On Key Techniques For Voice Conversion
7	The Research On Feature Parameters And Transformation Methods In Voice Conversion
8	G-super Frames
9	Investigation On Deep Learning Based Voice Conversion
10	Study On Voice Spoofing Detection Based On Deep Learning