Font Size: a A A

The Methods And Researches Into Construct Chinese-Japanese Named Entity Translation Equivalents

Posted on:2015-01-26Degree:MasterType:Thesis
Country:ChinaCandidate:K RuFull Text:PDF
GTID:2268330425488935Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Research on automatic extraction of Named Entity (NE) Translation Equivalents play a critical role in many tasks, such as automatic text summarization, machine translation and cross-language information retrieval and etc.. The traditional method often based on large-scale parallel or comparable corpus. But the practicability of the research results is constrained by the relatively scarce of the bilingual corpora resources. In this paper, we summarize the status of research in this field and proposed a method considering the characteristics of Chinese and Japanese to automatic extract the Chinese-Japanese NE translation equivalents based on inductive learning from monolingual corpora. This method uses the Chinese Hanzi (汉字) and Japanese Kanji comparison table to calculate the similarity of the NE instances between Japanese and Chinese. Then, we use inductive learning method to obtain partial translation rules of NE through extracting the differences between Chinese and Japanese high similarity NE instances. In the end, the feedback process updates on the Chinese and Japanese named entity similarity and rule sets. Experimental results show that the proposed method is simple and efficient, which overcome this shortcoming that the traditional methods have a severely dependency on bilingual resource. This method can build a large-scale Chinese-Japanese named entity translation dictionary using monolingual corpora.Compared with other methods, in this paper, we combine the language features of Chinese and Japanese. We propose a method based on inductive learning for automatically extracting NE translation equivalents from Chinese and Japanese monolingual corpora. It effectively reduces the cost to build the corpus and the need for additional knowledge when we use a weak correlation bilingual text sets and minimal additional knowledge to extract named entity translation equivalents.There are some problems with this approach, for example in the case of an insufficient amount of data, we face that the partial translation rules may not be extracted when it’s in pure Kana NEs. We propose a transliteration method based on traditional statistical machine translation, effectively improve the kana equivalents extraction result. Our future work will mainly focus on how to extract the reliable translation equivalents from the mass of correspondence, redundancy, heterogeneous, not standardized, and containing a lot of noise website.
Keywords/Search Tags:Named entity translation equivalents, Chinese Hanzi and Japanese Kanjicompari-son table, inductive learning method, transliteration method
PDF Full Text Request
Related items