Font Size: a A A

Research On The Construction Method Of Burmese Part-of-speech Tagging Corpus

Posted on:2018-05-09Degree:MasterType:Thesis
Country:ChinaCandidate:Z W LiFull Text:PDF
GTID:2358330518960500Subject:Metallurgical Control Engineering
Abstract/Summary:PDF Full Text Request
Part-of-Speech tagging(POS)is the basic work of natural language information processing,and the accuracy of POS tagging has a direct impact on the application of Natural Language Processing.Due to the weak foundation of Burmese Language in Natural Language Processing,the research work on the part of speech tagging method is relatively less.The Burmese language belongs to the language of resource scarcity,and the statistical POS tagging method based on statistics has not been able to achieve a significant effect on the part of speech tagging with the lack of large scale artificial markers.Therefore,it is of great importance to construct a certain scale of POS tagging corpus,which is of great value in developing the part of speech tagging.This paper focuses on the construction of Burmese corpus and the construction of the part of speech tagging corpus:(1)Due to the absence of an open Burmese Language Corpus,this paper focuses on the construction of Burmese Language corpus.Through the collection of Burmese news website,web page structure analysis,crawling Burmese news,get Burmese news text;collection of English-Burma dictionary,English-Burma-Chinese dictionary,construction of a certain scale of the Burmese Language Dictionary;the Chinese Burmese Bilingual News website,crawling Chinese and Burmese comparable bilingual documents,construction of Burmese corpus;(2)This paper proposes a corpus based approach to construct the POS tagging corpus.Based on the Chinese-Burma bilingual dictionary and WordNet bilingual word context vector similarity method to extract Chinese translation of Burma words,and use the method of bilingual speech mapping,to achieve the Burmese Language tagging part of speech tagging corpus construction in Myanmar.(3)This paper proposes a method to construct the corpus of corpus Tagging Based on dictionary knowledge.The first gain in a method of speech tagging dictionary words extraction of Burmese expansion in the corpus,part of speech tagging on crude Burmese monolingual news text by British Burma dictionary,and build some rules of not landing word and part of speech tagging rules support,while using the Bayesian model of words for POS disambiguation;by this method,complete the Burmese tagging work,build a corpus of Burmese tagging targets.
Keywords/Search Tags:Part-of-Speech tagging(POS), POS tagging corpus, comparable corpus, word similarity, Bayes
PDF Full Text Request
Related items