Font Size: a A A

Research On Bilingual Entity Extraction Method Based On Chinese-Burmese Bilingual Corpus

Posted on:2019-09-01Degree:MasterType:Thesis
Country:ChinaCandidate:AUNG HLA MOEZJFFull Text:PDF
GTID:2438330563457694Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Bilingual name entities have very important role in cross-language information retrieval,machine translation and other fields.As Myanmar language is a scarce resource,Chinese-Myanmar bilingual corpora construction faces many difficulties.This paper studies the language features of Myanmar language,the construction of Chinese-Myanmar bilingual comparable corpus and Chinese-Myanmar bilingual entity extraction method.The research results have important application value for carrying out Chinese-Myanmar bilingual cross-language retrieval and machine translation.Thesis mainly achieved the following results:(1)In the analysis of Myanmar language features,the characteristics of Myanmar characters,syllables,words,phrases,syntax and other language knowledge are analyzed and the standard parts of speech tagging in Myanmar,the parts of speech tagging in Myanmar sentences and the affixes of Myanmar texts are constructed.The analysis of linguistic rules provides the foundation for the fourth chapter.(2)In the Chinese-Myanmar bilingual corpus construction,the web crawler technology was used to automatically obtain Chinese-Myanmar bilingual documents from the Internet,689 documents of Chinese-Myanmar bilingual documents pairs include of 10118 Chinese-Myanmar Sentence pairs were constructed through manual proofreading.(3)In the extraction of Chinese-Myanmar bilingual name entities,a bilingual name entity extraction method based on Chinese-Myanmar bilingual corpora was proposed.First of all,extracting the name entities in Chinese sentences and the characteristics of entity categories,positions,lengths,etc.,in order to constrain the position and length of the sentences in the Myanmar name entity and then mark the Myanmar sentences based on Myanmar speech particles.The Myanmar candidate entity segment is segmented.Finally,by calculating the similarity between the Chinese name entity and the candidate Myanmar name entity segment,the candidate segment with the highest degree of similarity is selected as the corresponding Myanmar entity.Experiments have shown that the proposed method has a significant performance improvement over the dictionary-based method.(4)A Chinese-Myanmar bilingual sampling prototype system was developed to automatically extract bilingual name entities from Chinese-Myanmar bilingual comparable documents and the extracted name entities were automatically saved to bilingual name entity libraries.
Keywords/Search Tags:Myanmar, Chinese, Named Entity, Comparable Corpus, Bilingual Name Entities
PDF Full Text Request
Related items