Font Size: a A A

Alignment Based Acquisition Of Collocation And Application In Machine Translation

Posted on:2014-02-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z Y LiuFull Text:PDF
GTID:1228330392467654Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Collocation is a common and essential phenomenon of words collocating innatural languages. Collocation expresses the internal relationship in words and helpspeople understand language easily. In recent years, many researches regardingcollocation have been proposed, and collocations have been applied to many naturallanguage processing tasks such as machine translation, parsing and so on.Collocation is a very common language phenomenon, however sometimes it isarbitrary and has different forms of expression in different languages, which posesgreat challenges to collocation research.With the development of natural language processing, the study of collocationhas been attracting more attention. Currently, it includes two directions: one iscollocation extraction from preprocessed text; the other is collocation application,such as calculation of word similarity, translation selection and dependency parser,etc. Our research covers the two directions above. First, based on the bilingualstatistical word alignment method, we propose to extract collocations from the textusing monolingual word alignment, without resorting to any additional languageresource or preprocessing. Second, we try to apply the extracted collocation tovarious kinds of automatic machine translation methods. The main contents of thisdissertation can be summarized as follows:1. Collocation extraction based on monolingual word alignment. Collocation isone of the fundamental resources in natural language processing. Referring to theprevious research, we propose to extract collocation using monolingual wordalignment. More importantly, based on the extracted collocations, we explore a seriesof collocation applications in machine translation.2. Improvement of bilingual word alignment through monolingual collocation.Bilingual word alignment is one of the key technologies in statistical machinetranslation, whose quality directly influences the translation quality. Currently, theword alignment methods mainly focus on the improvement of the correspondencesbetween source language and target language. However, we propose a novel methodto improve bilingual word alignment using monolingual collocation. In this method,we calculate the collocation probability of several words to judge whether the wordscan be aligned in the same cept, which improves the precision of multi-word alignments.3. Improvement of translation model and matching based on statisticalcollocation model. Translation model is one of the important knowledge in machinetranslation and is crucial to the translation quality. The current research topics mainlyinclude the translation model filtering and compressing. We propose to improvetranslation model and matching through statistical collocation model. In this method,we first calculate two kinds of degree of the association of words, one is amongseveral words and the other is between those words and context. Then we employthese degrees of association to measure the possibility of words composing a phrase(including continuous phrase and hierarchical phrase). When the measures are addedinto statistical machine translation, the quality of translation model is effectivelydiscriminative, and the matching between translation model and input is improved.4. Reordering with source language collocations. Reordering is always achallenge in machine translation. Many methods have been proposed to improve theperformance of reordering, such as lexicon model, position model and syntacticmodel. Different from the existing work, we propose a novel reordering model forstatistical machine translation by means of modeling the translation orders of thesource language collocations. During decoding, the model is employed to softlyconstrain the translation orders of the source language collocations, so as to constrainthe translation orders of those source phrases containing these collocated words.5. Improving example-based machine translation (EBMT) with statisticalcollocation model. EBMT is one of the major automatic machine translation methods.It has been successfully applied in many translation domains. In the EBMT system,the performance of example selection and translation selection heavily influence thequality of the final translation. We propose to improve the performance of the EBMTmethod by using statistical collocation model, which are estimated from monolingualcorpora, in three aspects. First, the statistical collocation model is used to estimatethe matching degree between the input sentence and examples to improve theperformance of the example selection. Second, the performance of translationselection is improved by evaluating the collocation strength of the translationcandidates and the context. Third, the collocated words of the translation candidatesin the example are detected by the statistical collocation model and then thecollocated words are corrected according to the context.In conclusion, this dissertation not only focuses on collocation extraction, but also tries to apply the extracted collocation in machine translation. The research hasachieved some preliminary results, which we hope can be helpful to other researchersin this area. We believe that the research of collocation can make a greatbreakthrough as the natural language processing foundational techniques and theprocessing capability of large-scale data are improved. On the other hand, theprogress of the collocation techniques can also promote the development of otherrelated research.
Keywords/Search Tags:bilingual word alignment, collocation extraction, translation reordering, translation model, example based machine translation
PDF Full Text Request
Related items