Font Size: a A A

Research Chinese Speech Based On Speech Recognition And Speech Synthesis Conversion

Posted on:2014-02-16Degree:MasterType:Thesis
Country:ChinaCandidate:B HeFull Text:PDF
GTID:2268330401953153Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Voice conversion is a relatively new technology in the field of speech signal processing, it is to change a speaker’s voice, so that sounds like the other one’s voice. This technology combines a variety of techniques in the speech signal processing field, such as the voice signal analysis, speech recognition, speech synthesis, speech enhancement and so on. In this paper, for the purpose of developing Chinese speech conversion system we use the HMM speech recognition and speech synthesis methods to study Chinese speech conversion technology.According to the characteristics of Chinese, we choice initials and finals as of the basic unit of speech recognition and voice synthesis. A complete speech conversion system is composed of three parts:the speech recognition, parameters conversion and speech synthesis. The main works in this paper as follows:1. It elaborates the framework of the voice conversion system and experimental data preparation, including the selection of the1000recording corpus collected under the premise of considering the consonants, vowels and syllables coverage, inviting four people to record a voice library, recording format conversion, voice proofreading, recognizes the speech in the speech database, and extracts the time information of consonants and vowel from the speech recognition.2. Manual proofreading and adjusting the speech recognition results, it produces the rhythm mark on the basis of the long statistical of initials to generate mono sub and Triphone training annotation files, designs for the training of HMM synthesizer context attributes and problem sets, and carries on the training of HMM synthesizer in HTS-2.0platform.3. By the above method, it brings about two speaker’s HMM model, the marked files will be converted statement acoustic parameters generated by the two models, uses the interpolation method to generate the third person, also known as "virtual ".4. The generated "virtual" parameters through the STRAIGHT voice synthesizer generates speech waveform, conventional speech synthesis statement and the statement after parameter conversion will be evaluated by MOS and ABX.The naturalness of the speech synthesizer and the algorithm of voice parameters conversion is determinants of the transition effects. Experimental results show that:(1) In this paper, the synthesizer is average4.2in closed set, and3.9in open set, natural speech synthesis has basically reached an acceptable level.(2) using acoustic parameters interpolation to achieve the voice conversion, according to ABX subjective evaluation, the results show that the system can achieve the voice conversion function, we can control the converted voice more inclined to one of the two sources, and can be consolidated two source speaker’s personality traits.
Keywords/Search Tags:voice conversion, speech recognition, speech synthesis, HMM, parameterinterpolation
PDF Full Text Request
Related items