Font Size: a A A

Mandarin Speech Synthesis System And Rhythm Adjustment

Posted on:2008-06-19Degree:MasterType:Thesis
Country:ChinaCandidate:F LiFull Text:PDF
GTID:2178360278453454Subject:Software engineering
Abstract/Summary:PDF Full Text Request
So far, with the fast development of actual application of speech conversation system, speech calling center, speech website and speech e-mail service etc, a call for concerted effort of technology of text to speech rise up to a new high. Lots of demands of application draw the research and development of TTS technology to a new stage. The application in the scalable market mainly focuses on the areas definitely limited, such as scheduled flight query, stock price query and weather query and so on.The application of TTS technology in unlimited areas hasn't applied on a large scale. The main reason for this is the quality of the synthesized speech could not meet practical needs. This finds expression mostly in two aspects: one is the gap in speech quality. In general, TTS is a process including extracting speech features and regenerating speech through proper transformation. The speech regenerated after the transformation process from speech to parameters and again from parameters to speech had an obvious loss in speech quality including noise, echo and machine voice and so on. Another gap is cadence. TTS usually only generate so finite intonation modes that the synthesized speech sounded bald. Besides that, the system also process inappropriately in rhythm, degree and pause. As a result, the synthesized speech sounded unnatural.The main purpose of this paper aims at the improvement for these two aspects. The amelioration in tone focused on the extraction of the speech feature parameters while the adjustment of the rhythm centered on the duration of the synthesized units. The contrasting experiments and the corresponding betterment based on hmm-based speech synthesis system developed by HTS Group.First, according to the speech characteristics of Chinese, marked the training sentences picked up from the speech material database with rhythm tag and designed context properties and questions for the clustering of the base frequency, spectrum parameter and duration; then, with three currently popular parametric synthesis methods, extracted spectrum parameter and post parametric synthesis and did contrasting experiments. Compared these synthesized speech by the three methods, this paper finally adapted the method that is the mostly closed to the original speaker's tone and the least time-consuming both in the stage of extraction and synthesis; in succession, in allusion to the weak rhythm, a phoneme and rhyme duration prediction model was added to the original duration model.However, noise appeared in synthesized speech which may relate with the course of parametric extraction, and it will be the following work.
Keywords/Search Tags:Speech Synthesis, Hidden Markov Model, Parametric Extraction, Duration Prediction, Speech Naturalness
PDF Full Text Request
Related items