Font Size: a A A

Research On Sentence Alignment Based On Word Pair And Word Dictionary

Posted on:2020-11-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y DingFull Text:PDF
GTID:2428330578479406Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Sentence alignment is the process of mapping sentences in source text to their corresponding translations in target text.As a core component in constructing and developing parallel corpus,its performance is closely tied to the quality of parallel corpus.Under the neural network framework,this thesis focuses on modeling the word-pair relevance,the word-pair importance and lexical knowledge to improve the performance of sentence alignment.The main contents include:(1)Modeling word pairs for sentence alignment.Considering the fact that an aligned sentence pair usually contains a large number of aligned word pairs,this thesis explores the word-pair relevance between the source sentence and the target sentence,and proposes an approach of modeling word pairs for sentence alignment.Once a sentence pair is encoded by bidirectional recurrent neural networks,this thesis proposes the gated relevance network to model the semantic interaction between word pairs.The semantic interaction is further passed into a multi-layer perceptron to decide whether the sentence pair is aligned.(2)Word-pair relevance modeling with multi-view neural attention mechanism for sentence alignment.Considering that the aligned sentence pairs contain multiple aligned word pairs,and these word pairs play different roles during sentence alignment,this thesis further models word-pair importance and relevance.Firstly,this thesis employs different similarity measures to capture word-pair relevance from three perspectives.Then it models word-pair importance using the multi-view attention network.Finally,it integrates the word-pair relevance and the word-pair importance to determine whether the sentence pair is aligned.(3)Explicitly modeling word dictionary for sentence alignment.Inspired by the traditional sentence alignment approaches based on lexical knowledge,this thesis explores how to explicitly integrate lexical knowledge for neural sentence alignment.Specifically,this thesis proposes three cross-lingual encoders to incorporate word dictionary:Mixed Encoder which alternately encodes words and their translations,Factored Encoder which views word translation as a feature and concatenates word and feature embeddings,Gated Encoder which selectively controls the amount of translation moving forward by using a gate mechanism.Evaluated on the public NIST MT dataset and the movie subtitle OpenSubtitles2018 dataset,experiments show that the proposed approaches can significantly improve the performance of sentence alignment.
Keywords/Search Tags:sentence alignment, word-pair relevance, word-pair importance, word dictionary, neural network
PDF Full Text Request
Related items