Font Size: a A A

Research On Paraphrase Based Machine Translation System Combination

Posted on:2016-01-08Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y YaoFull Text:PDF
GTID:2308330479491053Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Machine Translation(MT) is a natural language processing(NLP) technology which can translate one nature language into another automatically. With the development and application of Internet, the communicatioin between countries and nations are more frequent. The machine translation technology becomes more and more important. In recent years, the machine translation technology based on large corpus training has occupied the mainstream status. Researchers have introduced the system combination technology which has been successfully applied in many fields to the method of machine translation, and acheieved promising results. However, in some translation tasks of minority languages,it is hard to acquire different machine translation systems trained by different large parallel corpus. In order to imporve the performace of the machine translation system combination on some minority languages, we prorosed the paraphrase based machine translation system combination technology to intruduce more useful information into the translation hypotheses set. In this way, we could ease the lack of the parallel corpus on some minority languages on some degree by generating paraphrase result of the hypotheses set through large monolingual corpus.In order to imporve the machine translation system combiantaion performace by useing the Paraphrasing Technology to generated paraphrase of translation hypothesis the sentences, we mainly conduct our research from the following three aspects:(1) Word level paraphrase based machine translation system combiantion. We treat word as the basic unit in this section. The procedure can be divided into three steps: firstly, find the word level paraphrase points in the machine translation hypothesis sentences. We conduct this procedure mainly by exploting the alignment information of the target sentence and the other sentences in the hypothesis set. Then we use the word distributed representation trained by word2 vec to find the word which is around the target word so we can obtain a synonym word.At last, the sentences generated by word level paraphrase and the original translation hypothesis are combined to get the system combination result.(2) Phrase level paraphrase based machine translation systemcombiantion. First of all, we use large scale parallel corpus to extract phrase level paraphrase. Based on the paraphrase table, we use the log linear model, to decode the source sentence. During this procedure, we use the language model, paraphrase probability to calculate the probability of the paraphrase sentences. By using the beam search algorithm, we search the output sentence which owes the maxmium probability. Finally, we combine the paraphrase sentences and translation hypotheses of the MT systems to get a better combination result.(3) Sentence level paraphrase based machine translation systemcombiantion. By exploiting the structure of the RNN Encoder-Decoder neural networks, we train a paraphrase model which treat the translation hypothesis as input and the reference as the output. In this way, we get the paraphrased sentence which has the consistent meaning with the input sentence.Finally, we combine the paraphrase sentences and translation hypotheses of the MT systems to get a better combination result.Experimental results show that by introducing paraphrasing technology to system combination, we enrich the translation hypothesis set with more useful information, which offers the combination system more choices. Paraphrasing mainly uses the monolingual corpus to generate the paraphrase results. In a certain degree, the paraphrase based machine translation system combination has relieve problem of lacking enough combination systems.
Keywords/Search Tags:machine translation, system combination, paraphrase, neural network
PDF Full Text Request
Related items