Font Size: a A A

Research On Neural Network-based Paraphrases Extraction And Reranking

Posted on:2016-07-07Degree:MasterType:Thesis
Country:ChinaCandidate:H P SunFull Text:PDF
GTID:2308330479490072Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
There are still some gaps between the performance of machine translation and the performance we expect because there exists the diversity of language and the computer’s ability to understand language is limited. In particular, there are no large bilingual parallel corpora in some small languages, and the data is sparsing, and the performance of machine translation is not particularly good. Paraphrase is a method of solving sparsing problem. We introduce paraphrase to improve the performance of machine translation. What’s more, the non-local feature always plays an important role in improving performance of machine translation. Nonlinear neural network model has stronger expression ability, moreover, by introducing the hidden layer, it can further abstract and explain the characteristics of the input layer in order to make better use of the non-local feature to improve translation quality according to the average measure of translation. So this paper uses neural network model to realize the reranking of discriminant function, to make use of non-local features, RNN langguage model features and linear interpolation method to improve the translation performance. The content of this paper is as follows:(1) We propose a phrase partitioning criterion. Firstly, we carry on the syntax annotation to the bilingual corpus. We store the data in the tree structure after syntax parsing, and extract the corresponding subtree containing noun phrases and verb phrases as phrase partitioning. The experimental results of different granularity of noun phrases and verb phrases are compared. Aiming at the nesting problem, we improve the algorithm and solve the nesting problem, then get more accurate phrase partitioning.The accuracy and recall of the result can reach 80% or more.(2) We build the phrase vector model according to the word vector model. The phrases generated by phrase partitioning are represented as phrase vector. Paraphrases are extracted by the method of K- mean clustering. We apply the results of the paraphrases to the statistical machine translation system, and we solve the sparsing problem of training corpus data by modifying phrase translation probability. From the experimental results we can also see that the improved phrase table can improve the performance of machine translation, probably improve the 0.3BLEU value.(3) We try to introduce neural network-based reranking model. We have built a layer of neural network model. Discriminant function and optimization function of this model is given. Random conjugate gradient algorithm is also given in this paper. On this basis, the RNN language model as a feature has been added to our reranking model. We make use of linear interpolation method to improve the reranking model. From the experimental results, the neural network-based reranking model can improve theperformance of machine translation.The experiments show that taking advantage of paraphrase extraction through the phrase vector to improve phrase table and the introduction of reranking model based on neural network can improve the performance of machine translation to some extent.
Keywords/Search Tags:phrase partition, phrase vector, paraphrase extraction, neural network, reranking
PDF Full Text Request
Related items