Voice Conversion Based On VQ Model And BP Network

Posted on:2010-01-06

Degree:Master

Type:Thesis

Country:China

Candidate:Y E Ding

Full Text:PDF

GTID:2178360275459037

Subject:Signal and Information Processing

Abstract/Summary:

PDF Full Text Request

With the development of the speech processing technology and the human's constantly pursuing AI (artificial intelligence), voice conversion become a new popular topic in research areas. Voice conversion is a technology that modifies the speech signals uttered by-a source speaker to sound as if a target speaker had spoken it. Voice conversion technology has many applications, such as TTS (Text-to-Speech) system, dubbing system and communication system. The main works of this thesis are:(1)VQ(Vector Quantization) is used to transform the spectral envelope . The spectral envelope is represented by the LSF (Line Spectrum Frequency) which is inferred from the LPC coefficient .The order of the LPC is 20. Compared with the LPC parameter, the LSF parameter has the better interpolation characteristic and the quantification characteristic. The training obtains 128 vectors of the source speaker and 128 vectors of the target speaker. The mapping codebook is obtained through the training which represents the correspondence between the source speaker's vectors and the target speaker's vectors. The mapping codebook is the weighting function of a linear combination of target speaker's vectors. The transformed LSF coefficient is similar to the target speaker's LSF coefficient.(2) VQ(Vector Quantization) is used to transform the residual .The transformation of the impulse is divided to three parts:one is the linear transformation of the residual's energy, the other is the transformation of the residual's waveform. based on VQ model. In the transformation of the residual's waveform, we give a definition of circular cross correlation function and use the inverse of it's maximum as the distance between two waveforms. The transformed residual retains target speaker's individuality information.(3)The Chinese speech's super-segmental features is regulated with the BP(Back Propagation ). Extract the curve of relative fundamental frequency of the source speaker and the target speaker. Train the mapping weight using a 3-layer BP network. The transformed curve of fundamental frequency is obtained by adding the mean of the target speaker's fundamental frequency to the transformed curve of fundamental relative frequency. The transformed curve of fundamental frequency is more like the target speaker's curve of fundamental frequency. This algorithm is capital of transforming Chinese speech and producing spontaneous voice.

Keywords/Search Tags:

speech conversion, vector quantization, super-segmental feature, back-propagation network

PDF Full Text Request

Related items

1	Small Vocabulary Chinese Isolated Word Speech Recognition Theory And Technology Research
2	Voice Conversion Research Based On Spectral Envelope And Super-segmental Prosody
3	Research On Ultra Low Bit Rate Speech Coding
4	Research On Speech Conversion Algorithms Based On Deep Convolutional Auto Encoder
5	The role of segmental sandhi in the parsing of speech: Evidence from Greek
6	The Application Of SOFM And Direct Vector Quantization To LD-CELP Speech Coding Algorithm
7	Research On Modelling And Conversion Of Segmental Feature
8	Speech Recognition Technology Based On Hybrid Model Of HMM And DNN
9	Research On Quantization Of Learned Digital Back-Propagation Nonlinear Compensation Algorithm In Optical Fiber Communication System
10	The Research Of The Speech Encoding & Decoding In The Digital Processing Of Speech Singnals