Font Size: a A A

Research On The Key Problems In Tourism Text Oriented Chinese-English SMT

Posted on:2015-02-15Degree:MasterType:Thesis
Country:ChinaCandidate:L LuoFull Text:PDF
GTID:2268330425495306Subject:Artificial Intelligence
Abstract/Summary:PDF Full Text Request
Machine translation has always been a hot research topic in the field of natural language processing, with statistical machine translation research upsurging, the machine translation in the aspects of theory and practice have made great progress. But statistical machine translation relies on the training corpus, due to lack of parallel training corpus, the translation quality is very poor in some specific areas. With the deepening of the globalization, cross-border tourism has become a part of people daily recreations. The machine translation system toward the field of tourism has great market prospect and the research significance, therefore, for the tourism field, this article studied some characteristics of the tourism text to improve the statistical machine translation system, and our work includes:(1) The study of discourse preprocessing, we proposed a non-information sentence identification model by combining rules and machine learning method and using ensemble learning and semi-supervised learning strategies in Chinese tourism text. In order to construct the initialization tagging seed sets, we first formulated rule template according to the characteristics of the non-information, adopt the rule-based method for tagging, and then we looked non-information sentence identification as a binary classification problem, we can use machine learning methods. Due to the structure small number of training sets by rule-based method and the unbalanced data, we introduced a semi-supervised learning based on the Self-Training strategy and ensemble learning strategies. The experimental results show that the model has better non-information sentence identification effect than other.(2) Study on methods for Chinese idioms translation, the problem of Chinese idioms translation has always been a common problem in the current mainstream machine translation systems, we proposed two methods to improve the ability of translating Chinese idioms by paraphrases in the Chinese-English SMT. First we improved and realized the three kinds of method to get the idiom paraphrases, and proposed the method to choice paraphrases according to the categories of idiom paraphrases. Then we replaced idiom paraphrases translation respectively in the test set and training set to translate the idioms. Our method can translate idioms which don’t appear in the training set, and reduce phrase alignment and probability calculation errors due to sparse problems of idioms in the training corpus, improving the ability of the translation of Chinese idioms.Finally, we combined the above two research work with the open source tools Moses, constructed for Chinese-English statistical machine translation systems toward the field of tourism.
Keywords/Search Tags:Statistical Machine Translation, Tourism Text, Discourse Preprocessing, Non-information Sentence Identification, Chinese Idioms Translation
PDF Full Text Request
Related items