Font Size: a A A

Research On HMM-based Dai Speech Synthesis System

Posted on:2018-11-18Degree:MasterType:Thesis
Country:ChinaCandidate:Z M WuFull Text:PDF
GTID:2428330518458663Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Speech synthesis is a technique to transform the text information into an audible speech signal by computer or electronic equipment.Although speech synthesis has a long history,most of them focus on the synthesis of Chinese and English languages,the less research in minority languages.In China,Mandarin speech synthesis technology has achieved product,speech synthesis technology of Tibetan,Uighur and other minority languages has been in the product stage,however speech synthesis research of Yunnan minority languages has not paid attention,especially the Dai language.For the purpose of developing a Dai speech synthesis system,considering the naturalness of synthesized speech is not high,this paper has researched on a trainable speech synthesis method based on the Hidden Markov Model(HMM)and its improvement method.The main jobs of this paper are as follows:1.Based on HTS-2.0 platform,analyze and correct the fundamental frequency parameters extracted by STRAIGHT.For the three error problem:the fundamental of multiple or half frequency,devoicing.In this paper,based on the short time average magnitude difference function(AMDF)method,a tool for correcting fundamental frequency errors is developed by using the MATLAB platform.And use the filter based on mathematical morphology to smooth the results.The experimental results show that the correction tool developed in this paper can effectively correct the fundamental frequency of errors and improve the effect of synthesis.2.Automatically segment 1244 sentence corpus Dai language using the HTK toolkit,and relabeling using Praat based on the result of automatic segmentation and TextGrid files.Speech automatic segmentation on Dai annotation tone and time information are not aligned,the pause time is too long or too short,prosodic annotation discrepancies with the actual.By using Praat,the segmentation error is corrected and the prosodic information is recorded according to the audio.The experimental results show that by modifying the automatic segmentation results,the synthesis of natural voice has been greatly improved.3.Based on the speech synthesis baseline system,this paper analyzes the synthesized speech quality and spectral parameters,proposed an improved time model and reselect the audio spectrum parameters to retrain the acoustic model.For the problem that the synthesized speech is too dull and the sense of rhythm is not strong,this paper selects the LSP parameters with better interpolation properties as the training data as the spectral parameters.At the same time,on the existing duration model,the decision tree model of phoneme length is added,during synthesis,the system of each state and phone duration model at the same time to make decisions,then,according to the weight to generate the final length of the model.The experimental result shows that the intelligibility,naturalness and rhythm of the speech synthesized by the improved speech synthesis system have been greatly improved.
Keywords/Search Tags:Speech Synthesis, Dai language, Naturalness, Phone Segmentation, Pitch Smooth, Spectral Parameters
PDF Full Text Request
Related items