Font Size: a A A

Research On The Chinese Voice Conversion System Based On GMM

Posted on:2016-08-13Degree:MasterType:Thesis
Country:ChinaCandidate:J LiFull Text:PDF
GTID:2308330470953817Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
The goal of the Voice Conversion is to change the personal characters of speaker’s voice, so that it has another speaker’s voice personality characteristics, and maintain the semantic speaker invariant. The Speech Signal Processing Toolkit (SPTK) is a suite of speech signal processing tools under using C language compiler for LUNIX environments, e.g., LPC analysis, PARCOR analysis, LSP analysis, PARCOR synthesis filter, LSP synthesis filter, vector quantization techniques, and other extended versions of them. In this paper, for the purpose of developing Chinese voice conversion system, we take the SPTK (Speech Signal Processing Toolkit) as the experimental platform to study Chinese voice conversion technology, which is based on the GMM (Gaussian Mixture Model) methods. It includes the main work as follows:(1) It elaborates the framework of the voice conversion system and experimental data preparation, including the selection of the900recording corpus collected under the premise of considering the consonants, vowels and syllables coverage, inviting four people to record a voice library. Among the selected900sentences recorded statement, there are700sentences as training the model statement, remaining as the test corpus.(2) In this paper, with the SPTK toolkit as the experimental platform, the voice of the source and target speakers are on the processing of frame, make it become the frame length is400ms, frame shift of80ms signal, and then use the Braque window function window, window and extract the24Mel-Frequency Cepstral Coefficients using the Linux command, and then we will take the mixed feature parameter to be aligned with using the DTW (Dynamic Time Warping) method. At last, we use the method of the EM (Expectation Maximization Algorithm) to obtain the optimal GMM model.(3) We will frame, window and extract the Mel-Frequency Cepstral Coefficients from the testing speech, and then converse the extraction feature with using the GMM model and the transformation rules from the training stage. At last, we will take the feature parameters change into a target speech.(4) Finally, we could construction the Chinese Voice Conversion System, and use the method of the MOS (Mean Opinion Score) to evaluate the speech.Experimental results show that:In this paper, we using The Gauss mixture model (GMM) and use The SPTK toolkit as the experimental platform to realize voice conversion system, the converted speech intelligibility and naturalness in voice conversion can reach the requirements of GMM. In addition, the synthesizer is average4.3in closed set, and3.8in open set, illustrate the performance of the voice conversion system is relatively good.
Keywords/Search Tags:Gaussian mixture model, Voice conversion, SPTK
PDF Full Text Request
Related items