Font Size: a A A

Research On Translation Rules And Translation Quality Evaluation Based On Deep Learning

Posted on:2020-07-21Degree:MasterType:Thesis
Country:ChinaCandidate:J YangFull Text:PDF
GTID:2428330575964619Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As the core of the statistical machine translation models,bilingual translation rules include phrase table from source language phrases to target language phrases and the probability scores of these phrases.In statistical machine translation models,bilingual translation rules are used to convert the source language phrase sequence into a target language phrase sequence in translation generation,providing input for translation reordering.In addition,bilingual translation rules can also be used as an external guidance resource for the neural machine translation models,providing phrase-level information for the translation selection.Therefore,under the background of the rapid development and wide application of deep learning,further research on the bilingual translation rules will help promote the further development of machine translation.At the same time,both statistical machine translation models and neural machine translation models have long existed over-translation problem and under-translation problem in their candidate translations.Both of them frequently appear and seriously affect the quality of candidate translations.However,current popular automatic evaluation metrics such as BLEU,which are commonly used in machine translation field,are incapable of targeted evaluation of above two problems and providing clear guidance for researchers trying to solve them.In summary,we proposed a bilingual phrase embedding model with semantic constraints,improved the bilingual translation rules based on deep learning methods,and two automatic evaluation metrics for over-translation and under-translation respectively.The main contributions are as follows:1.We proposed an improved bilingual phrase embedding model by introducing translation probability distribution and paraphrase probability distribution as constraints.In the traditional phrase-based machine translation model,phrases in the phrase table are regarded as different symbols,without considering the deep relationship in linguistic information between phrases,and each phrase pair is considered to be independent to each other,ignoring the constraints between phrases with similar semantics.Therefore,based on the using of bilingual-constrained recursive autoencoder for bilingual phrase embedding learning,we further proposed to introduce the translation probability distribution and the paraphrase probability distribution as new constraints,forcing the learned phrase embeddings to be semantically smooth,thus further enriching translation rules in machine translation models.In this paper,we extracted semantic similarities between source language phrase embedding and target language phrase embedding as features and integrated them into the phrase-based machine translation model.Experimental results of the NIST Chinese-English translation task demonstrated the effectiveness of our model.2.We also proposed two automatic evaluation metrics for over-translation and under-translation problems respectively.Both metrics are based on the proportion of mismatched ngrams between the gold reference and the system translation.They made up for the shortcomings of the commonly used automatic evaluation metrics such as BLEU,which only evaluate the overall translation quality in terms of adequacy and fluency,and cannot accurately evaluate the specific language phenomena in the candidate translation.We evaluate both metrics by comparing their scores with human evaluations in NIST Chinese-English translation results,where the values of Pearson Correlation Coefficient reveal their strong correlation and highlights the necessity and significance of our metrics.
Keywords/Search Tags:Deep Learning, Machine Translation, Phrase Embedding, Automatic Evaluation Metrics
PDF Full Text Request
Related items