Font Size: a A A

Research On Japanese-Kana And Chinese Named Entity Equivalents Automatic Acquisition Using Inductive Learning

Posted on:2017-02-18Degree:MasterType:Thesis
Country:ChinaCandidate:D M WangFull Text:PDF
GTID:2308330482987132Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Named entity translation equivalents play a critical role in cross-language information processing. It keeps important tasks in machine translation, automatic text summarization, cross-language information retrieval and automatic question answering system. Traditional methods often based on large-scale parallel or comparable corpus, but it is limited to the scale and quality of the bilingual resources. In Japanese-Chinese translation field, bilingual corpora resources relatively scarce and it is usually use the Chinese Hanzi and Japanese Kanji comparison table to deal with Chinese named entity. It is usually use statistical machine translation model to deal with the pure kana named entities. But the accuracy of these methods is limited by the scale and quality of the corpus resources and is inefficient. For resolving these problems, we propose a method using inducting learning with monolingual corpora. Our basic idea contents four steps, firstly, using conditional random field model to extract Japanese and Chinese named entities from monolingual corpus, and converts it to the Rome word sequence and the phonetic sequence; Secondly, Japanese-Chinese transliteration role base is constructed by using inductive learning as an example based learning method. Thirdly, rule base is iteratively reconstructed through feedback learning; Finally, our method use the role base to calculate the Chinese and Japanese named entity similarity to acquire the Japanese-Chinese named entity translation equivalents. Experimental results show that our method is simple and efficient, which overcome the shortcoming of the traditional method, severe dependency on large-scale bilingual resource.Compared with traditional methods, the innovation point of this paper is that our method adopts inducting learning to acquire translation rules of the characteristics of Japanese kana and Chinese named entities. And our method can use weakly correlated bilingual resources to extract Japanese kana and Chinese named entity equivalents for reducing cost.
Keywords/Search Tags:machine translation, named entities, Japanese kana, inductive learning method, transliteration
PDF Full Text Request
Related items