Font Size: a A A

Several Key Researches On Example Based Uyghur-Chinese Machine Translation

Posted on:2015-05-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:R H B A L MaiFull Text:PDF
GTID:1228330431992150Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Machine translation is one of the most active branches of artificial intelligence. In recent years, corpus-based machine translation methods become the hot research fields. As one of the main branches of corpus-based machine translation, it is easy to establish an EBMT (Example-based machine translation) system economically and quickly. So it is worthy to do some research on them.Uyghur is a typical agglutinative language, and it has many differences with Chinese from morphology to sentence structure. Rich morphology structure is one of the aspects of the difference, meanwhile it poses challenges and opportunity for Uyghur information processing.For the purpose of serving for Uyghur-Chinese example based machine translation, the thesis researches on several aspects including morphology, word alignment etc.The thesis done some researches as follows:1. Studies on methods for automatic reduction in Uyghur inflectional phenomenon, and propose an automatic reduction model based on word inner alignment. In contrast to previous methods which reduction it with rules, this method treats the reduction as the POS tagging problem, and uses statistical method to classify it. This method is not only easy to implement, also it works well on complex phenomenon.2. Studies on Uyghur word analyzer. This paper employs the hierarchical structure of Uyghur word, and proposes a directed graph model for Uyghur morphological analysis. In this model, words and tags are described as a directed graph. In this graph, nodes represent stems, affixes and their corresponding tags, while edges represent the transition, or general probabilities between nodes. The Graph model consistent with the structure of words and sentences in Uyghur and has good analytical skills. 3. Studies on Uyghur-Chinese word alignment methods. Propose a method that separating affixes from stem. Meanwhile, an affix in Uyghur often has different form, but their meaning are same. The thesis presents a method:words are divided to stem and affixes, and affixes are replaced by a symbolized form. Although Uyghur affixes have some semantic information, but are not translated into Chinese directly some time. Regarding to the phenomenon, this thesis purpose "Drop-Separate " scheme. In this scheme, the affixes which has Chinese means are separated from stem; the affixes which often no direct means in Chinese are not only separated from stem but also dropped. This method take into account of the length of sentences and the semantic information of affixes, plays more active role in Uyghur-Chinese word alignment.4. Regarding of the phenomenon that several words in Uyghur corpus represent one meaning, purpose improved mutual information method, combining statistical and rule strategy to improve the precision and recall of extracting multiword expression. Then choose some common method, including log-likelihood, mutual information and Chi square to analysis their efficient on extracting multiword expressions in Uyghur. The thesis also proposes a method that combining the two methods to make up each other’s deficiency. Meanwhile, according to the phenomenon that multiword expressions obtained by mutual information has irregular distribution, also proposes improved mutual information method. The result of experiment shows these methods have great influence on improving the precision and recall.5. Implementation an example based Uyghur-Chinese machine translation system. To increase the speed that matching similarity sentences and decrease the influence of morphology to retrieval function, propose using inverted index structure to store Uyghur words. At the same time, to obtain more similar sentences with high quality, using different methods to compute sentence similarity, including similarity of speech, similarity of word form, similarity of sentence length, and angle between similar segment. Driving by the principle of "from the position of translated segment in translated sentence to estimate the position in target sentence", generate the target sentence. All at once, using the generalized affixes form,"Drop-Separate" scheme and multiword extraction method on the system, the quality of the translated sentences are increased.
Keywords/Search Tags:Example based machine translation, Uyghur, Morphology Analyzer, Wordalignment, Multiword expression
PDF Full Text Request
Related items