Font Size: a A A

Research On Zero Pronoun In Spoken Language Machine Translation

Posted on:2017-02-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y L HuFull Text:PDF
GTID:2308330503958993Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Machine translation is one of the most important research area in natural language processing with it’s long history development for about half a century. And it still means a lot for the artificial intelligence filed. This paper proposed a new way to improve conversational language machine translation via pronoun recovery. Based on the statistical machine translation platform Moses, we pay our attention on the research of empty categories and analysis the corpus of different conversational area. This paper includes following contributions:We did a lot of research on the spoken language translation area and analysis it’s problems and difficulties. Then we focus on the recovery of the zero pronouns in the Chinese side. We try to solve the zero pronoun problem by using the empty categories concept. Inspired by the zero pronoun coreference resolution technology, we treat the problem of zero pronouns as a sequence labeling problem. Using the Pennsylvania Treebank corpus as the original training corpus to mark the position of the Chinese zero pronouns.In order to reconstruct a domain adapted zero pronoun tagging corpus, this paper proposes a new method of using bilingual alignment technology to reconstruct the corpus. Aligning the zero pronoun tagged source side data with it’s corresponding reference translations and then pick up the zero pronoun which can mostly helpful for the translation. Then we use the picked data as our new domain adapted dataset for the training.In this paper, we reconstruct a zero pronoun labeling model based on the field of the three domains corpus of Internet chat, SMS and telephone conversations. Using a parallel corpus which is fine tagged can be easily integrated into the translation system. In this way the probability of the zero pronouns in the translation table corresponding to the pronoun on the target side is increased. Improvement has been showed on our system with the BLEU value increase of 0.23, 0.83, 1.13 on the data of chat, SMS and telephone conversations respectively.
Keywords/Search Tags:machine translation, zero pronoun, anaphora resolution, empty category
PDF Full Text Request
Related items