Voice conversion is a technique that can change a speaker’s personality characteristic and makethe transformed voice to sound like another speaker’s voice. The paper focuses on the study of theparameters extraction and the mapping rules, to make the transformed speech be close to the targetspeech and improve the naturalness. The main work of this thesis contains the following parts:1. The paper studies the basic knowledge of voice conversion, including the theory of thepronunciation process and the mathematical model, the key technology of voice conversion, and theperformance evaluation method for VC. Then there is a detailed analysis of the extraction processof several speech feature parameters, and a variety of classical conversion methods of featureparameters.2. Based on the comparison of speech feature parameters, there is a conclusion that MFCCparameters are based on human ears’ characteristics, and have high spectral resolution on lowfrequency segment. Meanwhile based on STRAIGHT model, the speech feature parameters can beextracted accurately and modified greatly. So the paper studies MFCC extraction based onSTRAIGHT model and brings in GMM model to transform feature parameters. The experimentsresults show that the converted speech is close to the target speech.3. Propose an analysis and synthesis algorithm based on HNM model. The improved algorithmdecomposes speech signal into harmonic part and noise part, and extracts feature parameters of theharmonic part, then conducts a linear prediction analysis on harmonic part by inverse filtering toobtain the corresponding harmonic residual signal. Then synthesis the transformed speech. At last,plus the speech stochastic noise component decomposed from the HNM model to the synthesizedspeech as noise compensation. The system is stimulated, and the objective and subjective tests showthat the synthesized speech quality is better then ever. |