Font Size: a A A

Research On Text Normalization And Prosody Structure Prediction In Mandarin Text-to-Speech System

Posted on:2011-10-18Degree:MasterType:Thesis
Country:ChinaCandidate:T ZhouFull Text:PDF
GTID:2178360308462597Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the rapid development of computer science and related subjects, the technology of speech synthesis has made huge progress with amount of new theories and technologies. Text-to-speech is an important technique to generate the artificial speech from text dependent application. This technique has been widely applied in many fields, such as telecommunication services, embedded mobile application and entertainment.The valuation criterion of the output speech quality of a TTS system is mainly founded with intelligibility and naturalness. So far, the quality level of intelligibility which mostly bases on the Text Normalization module of the front end, is quite good expect the processing of none standard words, especially numbers and symbols. On the other hand, the prosody structure prediction module is mainly responsible for the naturalness of the synthetic speech, which is still far from the humanity level. The essential problem should be the effective analog of true voice prosody. The research of prosody processing mainly focuses on the following aspects:prosody prediction, prosody rules, prosody description and prosody moduling. The research in this paper mainly contains the text normalization module and prosody structure prediction module which are both in the front end of TTS system.The voice is not defined, as the unknown and infinite text input of TTS system. For the improvement of the intelligibility and naturalness, more information about text and prosody should be extracted from the input. Reseaarch shows, the verification of none standard words will provide great help in improving intelligibility, as well as the certain prosodic structure for the naturlness.The paper is based on the research of the mandarin feature, especially the date, phone number, facility name, and so on, which are commonly occurred in daily communications and cannot easily get the right voice through single mapping Pin Yin rules. With the study and comparation of the triditional methods for next normalization, this paper uses the Max Entropy model based method in dealing with text normalization applications.With the feature of acoustics and prosody in mandarin, full research is made in relationship upon prosody feature, pause, accent and prosody boundary. Comparing with the old method, this paper holds a certain mothed in the prosody structure prediction, based on the Conditional Random Field model.In the research of text normalization with Max Entropy model based method, this paper theoretically gives a clear definition of Max Entropy model, as well as conditional distribution and parameter estimation. Applicationally, the paper pays much emphasises on the feature design, extention selection and dynamic feature.The paper makes a keynote on the theory of Conditional Random Field model, with the problem of conditional destributing and parameter estimating, in the research of prosody structure prediction with Conditional Random Field model based method. Feature design and composition are also presented in this part. Experiment performance shows that considerable improvement is made in both text normalization and prosody structure prediction and works well in real mandarin TTS system.
Keywords/Search Tags:Text Normalization, Prosodic Structure Prediction, Maximum Entropy Model, Conditional Random Field Model
PDF Full Text Request
Related items