Font Size: a A A

The Application Research Of Word Sense Disambiguation In The Statistical Machine Translation

Posted on:2008-01-14Degree:MasterType:Thesis
Country:ChinaCandidate:X N TangFull Text:PDF
GTID:2178360242979090Subject:Computer applications
Abstract/Summary:PDF Full Text Request
Word sense disambiguation (WSD) has very important theoretical and practical significance for many natural language processing applications and is often assumed to be an intermediate task, which is essential for applications such as machine translation, information retrieval, content and thematic analysis, and even grammatical analysis.Recent years have seen steady accuracy gains in WSD models, and there is no question that WSD perspective has led to numerous insights in machine translation. However, word sense disambiguation does not yield significantly better translation quality than the statistical machine translation (SMT) system alone when many researchers used different integrating methods. The research work of this paper was set on this background.The paper researchs the application of WSD in two aspects of SMT. First, Integrating WSD predictions for decoding of SMT. Current statistical machine translation models are difficult to exploit WSD prediction. One major factor is the language model effect. The translation chosen by the SMT model will tend to be more likely than the WSD predictions according to the language model; The translation with the higher language model probability influences the translation of its neighbors, thus potentially improving BLEU score, while WSD prediction may not have been seen occurring within phrase often enough, thereby lowering BLEU score. This paper analyzes this problem and finally proposes a new method to exploit WSD prediction. Second, word alignment is very important to SMT, using WSD to improve the result of word alignment.This paper first realizes a WSD system aiming at disambiguating all Chinese substantive words .The Chinese-English parallel corpus is used and the main work includes how to get the English translations of every Chinese substantive word and get the disambiguating features and finally the building of disambiguating classificator. After that, integrating the WSD system into CARAVN (a SMT system) and improving word alignment quality using WSD system.
Keywords/Search Tags:Word Sense Disambiguation, Statistical Machine Translation, Language Model, Word Alignment
PDF Full Text Request
Related items