Font Size: a A A

Research And Implementation Of Maximum Entropy Based Machine Translation

Posted on:2017-11-04Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y YingFull Text:PDF
GTID:2348330485487979Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In China, the cooperation and exchange among different nationalities are regarded to be significant for social stability and acculturation of multi-nationalities. There are quantities of precious literatures written in Chinese minority languages needing to be translated into simplified Chinese for inheritance. However, as human translation hindered by very low efficiency and the existing machine translation between Chinese minority languages and simplified Chinese faced with poor quality, it is necessary for delving into new machine translation technology aiming to different languages. Machine translation refers to the computer-based intelligent transformation process from one source language to another target language. Owning to the promotion of computer performance and the rapid increase of available corpus data, statistical machine translation(SMT) is endowed with more obvious advantages, and successively it has become the mainstream human translation means. By using the ideas of maximum entropy, the statistical machine translation based on maximum entropy is a kind of direct translation model which fusions multiple features and selects approaches through designing different features for different fields. This means will lead to much better translation effect. The thesis is primarily focused on the technical details of the statistical translation model based on maximum entropy, then the targeted improvement of the model features is carried out for Uyghur-Chinese translation, and finally those features will be summarized to realize the Uyghur-Chinese statistical machine translation system based on maximum entropy. To be specific, this thesis consists of three aspects as follows:(1) Researches of Skip language model and smoothing techniques. A word frequency based Skip language model with smoothing treatment is put forward to solve the problem caused by lack of language model data. In this section, the training method of Skip language model combined with several smoothing techniques is introduced in detail and the performance comparison between this type of model and the n-gram language model is carried out. The results indicate that the Skip language model with smoothing technique can reduce the puzzlement degree of language model and effectively improve the quality of language model.(2) Studies on maximum entropy model for Uyghur-Chinese statistical machine translation. The technical details for features of maximum entropy model in general statistical translation are set out, besides a morpheme processing idea of Uyghur corpus and an improvement plan for affix cropping are put forward. In consideration of difference of sentence structure between Uyghur and Chinese and to reduce the weight of reordering model, a maximum entropy model that is more suitable for Uyghur-Chinese statistical machine translation is constructed. As proven in the experiment, the improved maximum entropy model helps the translated texts gain better BLEU scores.(3) Design and realization of maximum entropy based Uyghur-Chinese statistical machine translation system. In this part, the overall framework and process of the translation system is dissertated, and the principle, functions and core model documents of model training process for each module of the translation system are introduced, moreover, the actual effect of translation and model training are exhibited through comparison.In this thesis, the maximum entropy based Uyghur-Chinese statistical machine translation system framework is proposed, including treatment of bilingual corpus, model training and performance tuning. The system boasts a great value in actual practice as able to offer relatively accurate reference for Uyghur-Chinese translators. Based on the research achievements in Uyghur-Chinese translation, we are now working on the rescue translation of many precious literatures in Manchu language in cooperation with Xinjiang Ili Kazak Autonomous Prefecture Bureau of Cultural Heritage and Qapqal Xibe Autonomous County.
Keywords/Search Tags:statistical machine translation, maximum entropy, Skip language model, morpheme processing
PDF Full Text Request
Related items