Research On Sentence Alignment Based On Word Pair And Word Dictionary

Posted on:2020-11-01

Degree:Master

Type:Thesis

Country:China

Candidate:Y Ding

Full Text:PDF

GTID:2428330578479406

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Sentence alignment is the process of mapping sentences in source text to their corresponding translations in target text.As a core component in constructing and developing parallel corpus,its performance is closely tied to the quality of parallel corpus.Under the neural network framework,this thesis focuses on modeling the word-pair relevance,the word-pair importance and lexical knowledge to improve the performance of sentence alignment.The main contents include:(1)Modeling word pairs for sentence alignment.Considering the fact that an aligned sentence pair usually contains a large number of aligned word pairs,this thesis explores the word-pair relevance between the source sentence and the target sentence,and proposes an approach of modeling word pairs for sentence alignment.Once a sentence pair is encoded by bidirectional recurrent neural networks,this thesis proposes the gated relevance network to model the semantic interaction between word pairs.The semantic interaction is further passed into a multi-layer perceptron to decide whether the sentence pair is aligned.(2)Word-pair relevance modeling with multi-view neural attention mechanism for sentence alignment.Considering that the aligned sentence pairs contain multiple aligned word pairs,and these word pairs play different roles during sentence alignment,this thesis further models word-pair importance and relevance.Firstly,this thesis employs different similarity measures to capture word-pair relevance from three perspectives.Then it models word-pair importance using the multi-view attention network.Finally,it integrates the word-pair relevance and the word-pair importance to determine whether the sentence pair is aligned.(3)Explicitly modeling word dictionary for sentence alignment.Inspired by the traditional sentence alignment approaches based on lexical knowledge,this thesis explores how to explicitly integrate lexical knowledge for neural sentence alignment.Specifically,this thesis proposes three cross-lingual encoders to incorporate word dictionary:Mixed Encoder which alternately encodes words and their translations,Factored Encoder which views word translation as a feature and concatenates word and feature embeddings,Gated Encoder which selectively controls the amount of translation moving forward by using a gate mechanism.Evaluated on the public NIST MT dataset and the movie subtitle OpenSubtitles2018 dataset,experiments show that the proposed approaches can significantly improve the performance of sentence alignment.

Keywords/Search Tags:

sentence alignment, word-pair relevance, word-pair importance, word dictionary, neural network

PDF Full Text Request

Related items

1	Research On Sentence Alignment Method Based On Cross-lingual Word Embeddings
2	Study On Emotion Cause Pair Extraction Based On Fusion Word Vectors
3	Word Pair Extraction And Web-based Mining Of OOV Translations
4	Comparable Corpus Acquisition Of Cambodian-Chinese Parallel Sentence Pairs Based On Bidirectional Recurrent Neural Network
5	Research On Word Alignment Technology Based On Deep Learning
6	Based On Dictionary And Word Frequency Analysis Of The Unknown Words From The Bbs Of Corpus Recognition Research
7	Research On Chinese-English Word Alignment
8	Low-Resource Machine Translation Techniques For Distant Language Pair
9	The Research Of The Uyghur Sentence Word Clustering And Chinese-Uyghur Word Alignment
10	Research And Application On Dynamic Word Alignment For Interactive Translation