Font Size: a A A

An Approach Of Measuring Sentence Similarity Based On Word Vector And Its Application To Example-based Machine Translation

Posted on:2016-12-30Degree:MasterType:Thesis
Country:ChinaCandidate:M LiuFull Text:PDF
GTID:2308330476954952Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the latest years, the demand of translations is has been increasing greatly. On the one hand, although the tranditional human translation is more accurate and behaves better, one of its disadvantages is its inefficiency. One other hand, the tranditional machine translation has been improved a lot, however, its translation results are not so satisfactory to reach the actual use level. In this case, EBMT(Example Based Machine Translation) emerges and beomes one of the mainly used technologys of current translation industry gradually.Nowadays, the main problem of EBMT is the sentence similarity computing measures are limited in many ways and the accuracy is not so satisfactory. Especially, many sentence similarity computing methods are applied to the translation engeneering projects based on word composition which is too inefficient for the long and complex sentences. This paper focuses on the sentence similarity computing measures and applies word vectors to them. And I analyse the different features of English sentences and Chinese ones according to which I come up with the different methods respectively. The main work of this paper includes the followings:1. Collecting specific corpus according to the domains’ differences to train the word vector models of English and Chinese sentences using Google’s word2vec;2. Proposing a new method for English sentence similarity computing with using Jaccard similarity and edit distance based on word vectors. The experiment results of its implementation show this method improves the traditional method’s effects;3. Proposing a new method for Chinese sentence similarity computing with both Jaccard similarity and dependency grammar based on word vectors. The results of the experiment show an obvious improvement of the effects comparing with the traditional measures based on words;4. Implementing the above algorithms and encapsulating them as an interface to be employed for Hua Jian IAT translation platform.
Keywords/Search Tags:Computer Aided Translation, Sentence similarity computing, Word vector
PDF Full Text Request
Related items