Self-Correction Of Word Alignments System

Posted on:2018-12-29

Degree:Master

Type:Thesis

Country:China

Candidate:H M Gong

Full Text:PDF

GTID:2348330542465252

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

The core idea of statistical machine translation is to analyze the large number of bilingual parallel corpus,and then construct the statistical translation model to translate the test text.Bilingual word alignment is a very important part of statistical machine translation system.It is a prerequisite for generating the phrase table and the extraction of rules.The accuracy of word alignments is significant to the performance of statistical machine translation system.Also,the word alignments information is based on the bilingual sequence of statistical information without involving the bilingual hierarchy and language features,which would cause some word alignments errors,data sparse problem and other issues.Since it searches all possible word alignments while aligning the sentence pairs,some word alignments not conforming to the linguistic features are always incorporated into the aligning search space and may be output due to a larger statistical probability.In this paper,the study is based on bilingual hierarchical structure and linguistic features to improve the quality of word alignments and the performance of machine translation system.We propose a self-correcting mechanism of word alignments,which introduces a loop feedback mechanism based on traditional word alignments.It is able to re-plan the aligning search space based on the output of last round,then the incorrect word alignments can be avoided.In the loop feedback mechanism,the sentence pairs are divided according to different hierarchical structures,gradually transiting from sentence level structures to clauses and phrases level.The main works are as follows:(1)Judging the non-parallel relationship of sentence pairs.Since the alignment quality of Chinese and English corpora used for training is unknown,it is necessary to judge the alignment of sentence pairs based on the traditional word alignment informationin order to assure the validity of the binary segmentation method.(2)Locating the best segmentation point is the core component of the word alignment self-correction algorithm.A good partition point can effectively segment the complex sentence pairs and correct the original word alignment errors,and improve the quality of machine translation.In the algorithm,three methods are proposed:The binary segmentation method based on the punctuation,which selects the best partition point among all possible punctuation combinations.The binary segmentation method based on the related words,which uses the characteristic words of the related sentence components in the sentence as the the segmentation basises to divide the sentence pairs into fine parts.The binary segmentation method based on the statistical features,which adds the syntactic structure features based on the above methods to find the best segmentation point.The Gibbs sampling method is adopted to select the best segmentation position by the distribution of the statistical characteristic probability.The accuracy of the partition is improved and the rate of word alignment errors is reduced.(3)Identifying and correcting non-parallel relations.After obtaining the best segmentation points,we calculate the density and the error rate of the traditional word alignments,and determine whether to segment and correct the sentence further.Then the sub-pairs are used to run Giza++ and obtain the new word alignments,which will be merged according to the partition position.Finally,the quality of machine translation is improved.

Keywords/Search Tags:

Self-Correction, Word alignment, Binary segmentation, Gibbs sampling

PDF Full Text Request

Related items

1	Adaptive Gibbs Sampling Method Based On Network
2	Research And Implementation On The Prediction Of Transcription Factor Binding Site Based On Gibbs Sampling Algorithm
3	Research And Application On Dynamic Word Alignment For Interactive Translation
4	Traffic Forecasts. Inference Algorithm Based On Gibbs Sampling
5	Research On Improving The Performance Of Chinese-Uyghur Word Alignment For Statistical Machine Translation
6	Research On Chinese Word Segmentation And User Identification Based On Feature Alignment
7	Position Dependencies Between Multinomial Distribution For Motif Discovery Based On Gibbs Sampling
8	Research On Motif Finding Algorithm Based On Gibbs Sampling
9	Research On Chinese Word Segmentation Strategies For Statistical Machine Translation
10	Local,Dynamic And Fast Algorithms For Sampling From Gibbs Distributions