Font Size: a A A

Voice Conversion Based On VQ Model And BP Network

Posted on:2010-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y E DingFull Text:PDF
GTID:2178360275459037Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the development of the speech processing technology and the human's constantly pursuing AI (artificial intelligence), voice conversion become a new popular topic in research areas. Voice conversion is a technology that modifies the speech signals uttered by-a source speaker to sound as if a target speaker had spoken it. Voice conversion technology has many applications, such as TTS (Text-to-Speech) system, dubbing system and communication system. The main works of this thesis are:(1)VQ(Vector Quantization) is used to transform the spectral envelope . The spectral envelope is represented by the LSF (Line Spectrum Frequency) which is inferred from the LPC coefficient .The order of the LPC is 20. Compared with the LPC parameter, the LSF parameter has the better interpolation characteristic and the quantification characteristic. The training obtains 128 vectors of the source speaker and 128 vectors of the target speaker. The mapping codebook is obtained through the training which represents the correspondence between the source speaker's vectors and the target speaker's vectors. The mapping codebook is the weighting function of a linear combination of target speaker's vectors. The transformed LSF coefficient is similar to the target speaker's LSF coefficient.(2) VQ(Vector Quantization) is used to transform the residual .The transformation of the impulse is divided to three parts:one is the linear transformation of the residual's energy, the other is the transformation of the residual's waveform. based on VQ model. In the transformation of the residual's waveform, we give a definition of circular cross correlation function and use the inverse of it's maximum as the distance between two waveforms. The transformed residual retains target speaker's individuality information.(3)The Chinese speech's super-segmental features is regulated with the BP(Back Propagation ). Extract the curve of relative fundamental frequency of the source speaker and the target speaker. Train the mapping weight using a 3-layer BP network. The transformed curve of fundamental frequency is obtained by adding the mean of the target speaker's fundamental frequency to the transformed curve of fundamental relative frequency. The transformed curve of fundamental frequency is more like the target speaker's curve of fundamental frequency. This algorithm is capital of transforming Chinese speech and producing spontaneous voice.
Keywords/Search Tags:speech conversion, vector quantization, super-segmental feature, back-propagation network
PDF Full Text Request
Related items