Font Size: a A A

A Study On Chunk Based Chinese-mongolian Auto Translation Of Organization Name

Posted on:2018-06-18Degree:MasterType:Thesis
Country:ChinaCandidate:D ZangFull Text:PDF
GTID:2348330512496462Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The Machine Translation(MT)is the translation of text from one human language to another by a computer.Named Entities(NEs)are expressions that refer to the names of persons,organizations,locations and preper noun.NEs always carry the main message of the text.Translation of names of person and location can be done with the method of lookup dictionary and transliteration.Organization name is relatively complex and become the focus of named entities translation.At present Chinese-Mongolian machine translation focus on the universal machine translation.There is few research specific for named entities translation.In recent years Mongolian named entities recognition gets some progress.There has been a lot of new organiziton names with the development of science and techonology.Article 22 th of Mongolian language regulations of Inner Mongolia autonomous region requires communiting market words of autonomous region administrative area should be written in both Mongolian and Chinese.So there is greater demand of Chinese-Mongolian organization names translation in Inner Mongolia.We refered many studies relating to translation of organizition names from Chinese to other language then proposed the chunk based Chinese-Mongolian organizaton names translaton method.In this paper we accomplished following works.First,we set up corpus resources required by translation of Chinese-Mongolian organizations including Chinese-Mongolian gazetteer(9273 pairs),Chinese-Mongolian organization names dictionary(7096 pairs),Chinese-Mongolian organization names parallel corpus(19110 pairs),Chinese Pinyin and Mongolian syllable correspondence table(405 pairs).Second,we divided the organization name into location chunk,identification chunk,properties chunk and mechanism chunk using the Conditional Random Fields(CRFs)model.We practice the training by the open source tool CRF++-0.58,and training corpus is manual BIO labeled Chinese organizaton name.Third,we pretranslate the location chunk and identification chunk according to the result of chunking.We translated the location chunk by lookup gazetteer.If the location is out of gazetteer,we translated the location by transliteration and paraphrasing the suffer-word of location combined way.About the translation of identification chunk,we transformed the Chinese into Pinyin then convert the Pinyin into Mongolian syllable.Forth,we used Moses decoder throughout our experiments.Input pre-translating location chunk and identication chunk can be realized fairly easily in Moses via XML markup of input sentences and accomplished the chunk based Chinese-Mongolian organizition names translation.We compared the result of our organizition name translation system with normal phrase based translation system.The BLEU score of our system is 0.0364 higher than phrase based translation system.
Keywords/Search Tags:organizition name, chunk, Chinese-Mongolian machine translation
PDF Full Text Request
Related items