Font Size: a A A

Research On Word Alignment In Statistical Machine Translation

Posted on:2013-01-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:S J HuangFull Text:PDF
GTID:1118330371486854Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Along with the rapid development of internet, cross-lingual activities become more common than before, which bring an increasing demand for real-time transla-tion between different languages. The only practical way to deal with this demand is through automatic translation systems. Among different research on automat-ic translations, statistical machine translation (SMT) becomes popular because of its ability of automatically mining translation resources and better translation per-formance on various domains. Word alignment is a core task in SMT. It learns translation equivalences from parallel corpora and serves as the major source of translation knowledge.Classical methods of word alignment usually employ one or more cascaded gen-erative models. Recently, huge progress has been made when discriminative methods are employed for word alignment. Compared to generative methods, it is much easier to incorporate features into discriminative systems, thus better performance could be achieved. However, there are still problems for discriminative learning of word alignment. First of all, searching among all possible alignments is difficult due to the search space of exponential complexity. It is usually inefficient to perform exact search, while approximate strategies usually bring search errors. Secondly, discrimi-native learning usually requires a certain amount of labeled data. As more and more features are employed for the task of word alignment, some methods may suffer from the problem of insufficient training data. Thirdly, discriminative learning methods usually employ alignment error rate (AER) as the objective function.However, some research shows that the correlation between AER and the ma-chine translation metric is considerably weak. As a result, some methods, which improve AER significantly, only have moderate improvements in machine transla-tion quality.In this thesis, the task of discriminative training of word alignment is investi-gated in the following aspects.1. The thesis addresses the issue of the alignment search efficiency during syn-chronous parsing with an inversion transduction grammar (ITG). The problem of spurious ambiguity is analyzed. A grammar is proposed to tackle such ambigui-ties. The effectiveness of the grammar is proved in theory and demonstrated by the improvement of parsing efficiency and discriminative learning results in empirical ex-periments. The pruning issue in the context of synchronous parsing is also discussed. Different from previous methods which only focus on pruning bi-lingual spans, this thesis propose to perform pruning on alignment hypotheses directly. By pruning hy-potheses dynamically during parsing, alignment search is better constrained, which improves the efficiency.2. A semi-supervised learning framework of word alignment is proposed. The framework first transforms the alignment problem into a series of binary classifi-cations, then uses a large amount of unlabeled data to assist the training of these classifiers. Binary classifications are much faster than structural search of word alignment, which makes it possible to perform semi-supervised learning efficiently. As the performance of the classifier improved, the alignment quality is improved, as well.3. A metric called error sensitive alignment error rate (ESAER) is proposed for evaluating alignment qualities. ESAER is based on detailed analysis of the interaction between alignment errors and the phrase extraction process. The metric uses different penalties for different types of alignment errors and takes the varying extents of errors into consideration, as well. Compared with AER, ESAER correlates better with machine translation metrics.
Keywords/Search Tags:statistical machine translation, word alignment, structureprediction, synchronous parsing, alignment evaluation metric
PDF Full Text Request
Related items