Font Size: a A A

Text Analysis Of Burmese Language For Speech Synthesis

Posted on:2019-07-18Degree:MasterType:Thesis
Country:ChinaCandidate:X YangFull Text:PDF
GTID:2428330548473449Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Burmese is the common language of the Commonwealth of Burma,with a population of about fifty-four million,Burmese and Chinese is an isolated language type of language,but also a tone language,because of this characteristic,its form of word formation are simple words and synthetic words.In order to develop the language conversion system of Burma language,this paper studies the front-end text analysis method in the system of text to language conversion.The main work of the paper is as follows:1.We use crawler tools to download a large number of corpus from all kinds of Burmese website,collate and unify text symbols,and further select corpus,remove sentences with inappropriate length and format,and finally select sentences for text analysis and research.2.We capture Burmese words from a professional Burmese dictionary software,set a crude Burmese Language dictionary.And then collate and unified the format of the dictionary,construct a Burmese dictionary that can be used in word segmentation.At the same time,the code of the dictionary and the text are unified.3.With reference to the Burmese literature and the Burmese Language Teaching subsidiary,the characteristics and syllabic structure of Burmese are studied and analyzed in detail.We generalize the syllabic boundary rule of Burmese Language,which is used as the syllable division of Burmese.Using a rule based approach,a program is written to realize the syllable division of the Burmese Language.The results show that the correct rate of syllable segmentation in this paper can reach 100%.4.According to the characteristics of the Burmese word formation,we choose the word segmentation method which is maximum forward matching algorithm based on the dictionary-based.When the program is written to realize the automatic segmentation of the Burmese,we take the syllable as a unit,and the result of the syllable splicing is matched with the constructed dictionary.The results show that the correct rate of word segmentation is 80.6%.5.According to the Burmese phonetic system and MLC transliteration system,we formulate a Burmese Romanization method.The method inherits the advantages of MLC transfer system,and at the same time optimize the Romanization of tone.On the basis of the Romanization method,the process of Romanization is designed and the Burmese Language text is realized by the programming of Python language.The result shows that the correct rate of Romanization can reach 100%.6.We study the pronunciation characteristics of Burmese numerals,and digital Burmese in text representation,construction of digital control form and pronunciation of Burmese Burmese classifiers,including Burmese figures in different circumstances change phenomenon.According to the two tables and the characteristics of the Burmese Language,the program is written to achieve the normalization of Burmese numbers.The results show that the correct rate of numerical Normalization is 94.3%.In summary,the syllabic boundary rule,Romanization method,word segmentation method and numerical Normalization method proposed in this paper,can basically reach the requirement of Burmese speech synthesis system.
Keywords/Search Tags:speech synthesis, Burmese, syllable segmentation, Romanization, Normalization
PDF Full Text Request
Related items