Font Size: a A A

A Study Of Burma 's Lexical Methods

Posted on:2017-04-19Degree:MasterType:Thesis
Country:ChinaCandidate:X D HanFull Text:PDF
GTID:2278330488965663Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Myanmar lexical analysis is a basic work of information processing of Myanmar, and the lexical analysis results will directly affect the effect of application of Myanmar information processing. Word segmentation is the basis of Chinese language processing that is widely applied in text classification, literature index, intelligent retrieval and natural language processing. Burmese is not widely used like other international languages and the researching for natural language processing is relatively weak. As there exists differences between languages, the traditional lexical analysis technology of Chinese and English can not directly applied to Myanmar. In order to enrich the theory research and application of Myanmar lexical analysis, and offer basic support for Myanmar information processing, this paper did some researches about the construction of Myanmar lexical analysis:(1) Proposed and implemented a rule-based Burmese syllable segmentation method. We found the letters characteristic by studying UTN114 and make rules. We comparative the state between letters that conform FSM to segment syllable. Experimental results show that the algorithm implemented on the Burmese syllable segmentation has a high accuracy rate.(2) Presents a feature-based fusion syllable cascaded conditional random Burmese word segmentation. This method has a double conditional random; the first layer is a syllable as the basic unit of word segmentation model using context information and Myanmar word features to define the feature template, to segment Myanmar sentence. The second layer is a word as the basic unit of Word segmentation correction model to correct the error words, which are named entity. Experimental results show that the proposed two-layer model can effectively improve the accuracy of word segmentation.(3) Analysis of the characteristics of the current encoding mainstream Burmese fonts and unified fonts encoding. We design and realize the prototype system of Myanmar lexical analysis system.
Keywords/Search Tags:Myanmar, rules, finite-state machine, syllable segmentation, word segmentation, CRFs
PDF Full Text Request
Related items