Font Size: a A A

Research On Phrase-based Statistical Machine Translation

Posted on:2011-06-22Degree:MasterType:Thesis
Country:ChinaCandidate:P DiFull Text:PDF
GTID:2178360305976538Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, machine translation technology demonstrates great potential for wide application. Phrase-based statistical machine translation systems treat any consecutive words as phrases, regardless of the rationality of the phrase. Therefore, they often generate a large number of redundant phrases, which not only enlarges the system's search space, but also severely deteriorates the quality of translation. To address these problems, we focus on how to effectively improve the quality of the phrase table.First, this paper analyzes the theory and techniques of statistical machine translation, and builds a phrase-based machine translation system as a baseline. Then, two methods using C-value and phrase cohesion value are proposed to score the rationality of phrases, leading to a more effective phrase table. Experiments show that the C-value method can reduce the size of the phrase table to 78% with an increase of 0.02 units in the BLEU score. While the phrase cohesion decreases the size of the phrase table to 47.5% with a boost of 0.0158 units in the BLEU score.Second, a topic model is applied to the statistical translation system. During training a topic distribution is determined for each phrase, while during testing, phrases unrelated to the topic are filtered to boost the performance. Experiments show that, compared with the baseline system, the BLEU score using the topic model can be improved by 0.0136 units.Last, we combine the topic model and the C-value method to further shrink the phrase table while retaining its efficacy. Experiments show that when the size of the phrase table is reduced to 57%, the BLEU score can also be increased in some degree.The research and experiments in this paper show that our methods can effectively reduce the size of the phrase table and optimize its rationality, thereby significantly improve the translation quality. While it is a new approach to incorporate a topic model into machine translation systems, in the future we will further explore how to exert the full advantages of the topic model in statistical machine translation.
Keywords/Search Tags:statistical machine translation, phrase table, C-value, phrase Cohesion value, topic model
PDF Full Text Request
Related items