Research On Translation Rules And Translation Quality Evaluation Based On Deep Learning

Posted on:2020-07-21

Degree:Master

Type:Thesis

Country:China

Candidate:J Yang

Full Text:PDF

GTID:2428330575964619

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

As the core of the statistical machine translation models,bilingual translation rules include phrase table from source language phrases to target language phrases and the probability scores of these phrases.In statistical machine translation models,bilingual translation rules are used to convert the source language phrase sequence into a target language phrase sequence in translation generation,providing input for translation reordering.In addition,bilingual translation rules can also be used as an external guidance resource for the neural machine translation models,providing phrase-level information for the translation selection.Therefore,under the background of the rapid development and wide application of deep learning,further research on the bilingual translation rules will help promote the further development of machine translation.At the same time,both statistical machine translation models and neural machine translation models have long existed over-translation problem and under-translation problem in their candidate translations.Both of them frequently appear and seriously affect the quality of candidate translations.However,current popular automatic evaluation metrics such as BLEU,which are commonly used in machine translation field,are incapable of targeted evaluation of above two problems and providing clear guidance for researchers trying to solve them.In summary,we proposed a bilingual phrase embedding model with semantic constraints,improved the bilingual translation rules based on deep learning methods,and two automatic evaluation metrics for over-translation and under-translation respectively.The main contributions are as follows:1.We proposed an improved bilingual phrase embedding model by introducing translation probability distribution and paraphrase probability distribution as constraints.In the traditional phrase-based machine translation model,phrases in the phrase table are regarded as different symbols,without considering the deep relationship in linguistic information between phrases,and each phrase pair is considered to be independent to each other,ignoring the constraints between phrases with similar semantics.Therefore,based on the using of bilingual-constrained recursive autoencoder for bilingual phrase embedding learning,we further proposed to introduce the translation probability distribution and the paraphrase probability distribution as new constraints,forcing the learned phrase embeddings to be semantically smooth,thus further enriching translation rules in machine translation models.In this paper,we extracted semantic similarities between source language phrase embedding and target language phrase embedding as features and integrated them into the phrase-based machine translation model.Experimental results of the NIST Chinese-English translation task demonstrated the effectiveness of our model.2.We also proposed two automatic evaluation metrics for over-translation and under-translation problems respectively.Both metrics are based on the proportion of mismatched ngrams between the gold reference and the system translation.They made up for the shortcomings of the commonly used automatic evaluation metrics such as BLEU,which only evaluate the overall translation quality in terms of adequacy and fluency,and cannot accurately evaluate the specific language phenomena in the candidate translation.We evaluate both metrics by comparing their scores with human evaluations in NIST Chinese-English translation results,where the values of Pearson Correlation Coefficient reveal their strong correlation and highlights the necessity and significance of our metrics.

Keywords/Search Tags:

Deep Learning, Machine Translation, Phrase Embedding, Automatic Evaluation Metrics

PDF Full Text Request

Related items

1	Research On Automatic Machine Translation Evaluation With Documental Information
2	Two Direction Machine Translation Based On Sentence Semantic Embedding And Its Evaluation
3	Research On Term Automatic Translation Technology Based On NP Tree For English Patent Documentation
4	Translation Knowledge Acquisition In Corpus-based Machine Translation
5	Research On Automatic Evaluation Of Machine Translation Based On Linguistic Knowledge
6	Research On Chinese Complex Noun Phrase Translation Extraction Based On Multi-strategy
7	Research On Phrase-based Statistical Machine Translation
8	Research On Chinese-English Neural Machine Translation Based On Joint Learning
9	On Key Technologies For Phrase-Based Statistical Machine Translation
10	The Study On Phrase-Based Statistical Machine Translation System