Bilingual Word Alignment System Based On English-chinese Parallel Corpus

Posted on:2020-10-02

Degree:Master

Type:Thesis

Country:China

Candidate:J J Zhou

Full Text:PDF

GTID:2428330590471684

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

The core of statistical machine translation is bilingual parallel corpus,which requires statistical analysis of a large number of parallel corpus to construct a translation model.Bilingual word alignment is a key step in statistical machine translation system.The accuracy of word alignment will directly affect the performance of the translation system.In addition,the corpus of the word alignment information has great application value.It can provide important support for natural language processing tasks such as dictionary compilation,cross-language information retrieval and semantic disambiguation.Therefore,how to obtain high-quality bilingual word alignment has great research value.The existing word alignment methods are often aligned by statistical information,without fully considering the linguistic characteristics between different languages.In the training process,a large number of labeled words are usually required to align the data,while the manually labeled alignment data is too small to meet the training requirements.The traditional word alignment model considers the lexical features to be sparse,which results in poor alignment of low-frequency words in the corpus.In view of the above problems,this thesis uses deep learning method to study word alignment.Specifically,this work mainly consists of the following three aspects:(1)Research on word alignment method based on Recurrent Neural Networks.This method incorporates the traditional Hidden Markov Model into the Recurrent Neural Networks.By considering the context information of sentences and using the similarity of vocabulary,the low-frequency words in the sentence are replaced by the common words with similar meanings.Through this common word,the corresponding relationship with the target language words can be found,and the alignment information of low-frequency words can be obtained.The model adopts unsupervised learning method,which saves the cost of manual tagging alignment corpus.The experimental results show that this method improves the quality of word alignment.(2)This thesis proposes a word alignment method that incorporates dependency relation.This method requires dependency analysis of the input sentences.The Bi-directional Long Short-Term Memory is used to extract the word embedding features of context,and the Attention Mechanism is introduced to control the fusion of features.Finally,the dependent analyzer with better analysis results is obtained.And the bilingual training corpus is labeled with dependency relationship through this analyzer.Dependency relation information and part of speech information are used as features and integrated into a log-linear model to obtain word alignment information.(3)On the basis of word alignment,a phrase-based statistical machine translation system is implemented,which mainly includes the training of translation model,training of language model,and decoding module.The translation system in this thesis is compared with the commonly used online translation platform.

Keywords/Search Tags:

word alignment, parallel corpus, neural network, statistical machine translation

PDF Full Text Request

Related items

1	Research On Bilingual Corpus-Based Machine Translation
2	Research On Sentence Alignment Method Based On Cross-lingual Word Embeddings
3	Research And Application Of Key Technologies Of Chinese-english Parallel Corpus
4	Improved word alignments for statistical machine translation
5	Research On Word Alignment In Statistical Machine Translation
6	Study On Word Alignment Technology And Construction Of Statistical Machine Translation Platform
7	Exploring Method Of The Construction Of Parallel Corpus For Machine Translation In A Specific Domain
8	Research And Implementation On Uyghur-Chinese Neural Machine Translation
9	Research On Word Alignment Technology Based On Deep Learning
10	Morphology-Processing In Chinese-Mongolian Statistical Machine Translation