Font Size: a A A

Research On Prosodic Structure Prediction Method In Myanmar Speech Synthesis

Posted on:2023-04-24Degree:MasterType:Thesis
Country:ChinaCandidate:W L TanFull Text:PDF
GTID:2545306617476614Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Burmese belongs to the Sino Tibetan language family,and Burmese is a Pinyin character.Similar to Chinese,Myanmar lacks obvious boundary marks between words and prosodic boundary information.A speech that can express information in cadence is inseparable from prosodic information.It can be seen that in speech synthesis,if prosodic information can be added to make the model synthesize cadent speech,it is very meaningful.The main work of this paper includes:(1)According to the speech characteristics of Myanmar language,initial consonants and vowels are selected as synthesis primitives in speech synthesis.On the basis of constructing the data set required in this experiment,the monophonic forced alignment methods based on HTK and MFA(Montreal forced aligner)are also compared and analyzed.It is determined to use the MFA alignment method to realize Myanmar phonon forced alignment,so as to effectively improve the training effect of acoustic model.(2)According to the characteristics of Myanmar language,a multi task prosodic structure prediction method of Myanmar language based on neural network adaptive gating mechanism is proposed and implemented.The outputs of the two networks in the parallel layer are dynamically screened through the gating mechanism.The experimental results show that the F1 values of prosodic words and prosodic phrases are 82.47% and83.65% respectively.(3)In order to further improve the prosodic structure prediction effect of Myanmar language,this paper uses the pre trained language model to dynamically obtain word representation,proposes and implements a prosodic structure prediction model of Myanmar language based on MRC,uses two classifiers to predict the head pointer sequence and tail pointer sequence respectively,and adds external knowledge to assist learning to improve the prediction effect.The experimental results show that the F1 values of prosodic words and prosodic phrases are 87.45% and 85.44% respectively.(4)In order to verify the contribution of prosodic structure prediction to the naturalness of Myanmar speech synthesis,two prosodic structure prediction methods proposed in(2)and(3)are applied to Myanmar speech synthesis system to verify the effectiveness of the prediction method.The experimental results show that after introducing the prosodic structure prediction method based on neural network adaptive gating mechanism,the MOS(mean opinion score)value and MCD(Mel cepstral distortion)value of Myanmar synthetic speech are 3.75 and 15.76 respectively.After introducing the prosodic structure prediction method based on MRC,the MOS value and MCD value of Myanmar synthetic speech reach 3.86 and 15.36 respectively.The above experimental results show that multi task learning using gating mechanism can effectively improve the accuracy of prosodic structure prediction.After introducing the pre training model,prosodic structure prediction by predicting head pointer sequence and tail pointer sequence can further improve the prediction effect and improve the quality of synthetic speech.
Keywords/Search Tags:Myanmar, speech synthesis, prosodic structure
PDF Full Text Request
Related items