Font Size: a A A

The Research Of Voice Conversion Based On Neural Network

Posted on:2018-07-10Degree:MasterType:Thesis
Country:ChinaCandidate:X F YangFull Text:PDF
GTID:2348330533968238Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
The voice conversion technique is a technique that converts the voice of the source speaker into the voice of the target speaker.As a highly interdisciplinary subject,voice conversion technology has been applied in terms of language conversion,medical assistance and communication confidentiality,and has been widely used in other fields.The study of voice conversion can not only to deepen the theoretical development in the field of signal processing,but also to deepen this research progress with the cross fields.Therefore,the study of voice conversion technology have shown important significance in all aspects.The most commonly used models for voice conversion are Gaussian Mixture Model(GMM)and Artificial Neural Networks(ANN).Considering the GMM model has the problem of smoothing and over-fitting,this paper selected ANN model for voice transfer.The Radial Basis Function(RBF)model in ANN is simple and can be approximated to any nonlinear function.The Generalized Regression Neuron Network(GRNN)as a special case of RBF,has a strong nonlinear mapping capability,a simple network structure and a high robustness.In this paper,PSO-GRNN model is obtained by parameter optimization of Particle swarm optimization(PSO)for the problem that GRNN model has only one model parameter.The new model not only can reduce the influence of artificial parameters selection on the transformation model,but also can improve the learning ability of the network.Therefore,the ANN model used in this paper has RBF model,GRNN model and PSO-GRNN model.The experimental results show that the conversion voice based on PSO-GRNN model is closer to the target voice than the conversion voice based on RBF model and GRNN model.Linear prediction Coding(LPC)shows the low accuracy of the description of nasal and blasting sound,the STRAIGHT model can decompose the voice signal to obtain independent spectrum parameters and fundamental frequency parameters,and reconstructs these parameters into speech.Therefore,this paper uses the STRAIGHT model instead of the LPC model to decompose and synthesize the voice signal and perform the same experiment.The results of similarity evaluation show that the conversion voice based on STRAIGHT and PSO-GRNN model is closer to the target voice than the conversion voice based on LPC and PSO-GRNN model.
Keywords/Search Tags:Voice Conversion, Generalized Regression Neuron Network, Particle swarm optimization, Linear Prediction Coding, STRAIGHT
PDF Full Text Request
Related items