Font Size: a A A

Research On Parallel Corpus-Based Cross-Lingual Entity Relation Extraction

Posted on:2017-05-03Degree:MasterType:Thesis
Country:ChinaCandidate:H T HuiFull Text:PDF
GTID:2308330488461980Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Parallel corpora not only play an important role in cross-language relation extraction research, but also provide a valuable data platform for revealing the difficulty of natural language processing tasks in different languages and studying the complementary and redundancy between languages. However, traditional parallel corpora are only aligned at sentence level, limiting their effects on research in cross-language natural language processing. In view of this, this article conducts research in the following aspects:(1) Construction of an instance-level Chinese-English parallel corpus. On the basis of the OntoNotes, we construct a Chinese and English parallel corpus aligned at instance level for information extraction by combining automatic extraction, automatic mapping and manual annotation. And then relation extraction research using SVMs is conducted on the corpus for both Chinese and English. Finally, we shed some light on the difficulty of relation extraction in two languages from both syntactic structure and lexical level.(2) Bilingual co-training for relation classification. Based on the previous parallel corpus, we adopt a co-training framework to relation classification for Chinese and English.The experiments demonstrate that bilingual co-training always outperforms the normal bootstrapping, and with good robustness.(3) Bilingual active learning for relation classification. Bilingual active learning is applied to Chinese and English relation classification wherein joint confidence is used to acquire the instances with the highest uncertainty. The experiments indicates that, under the premise of the same number of annotated instances, bilingual active learning always obtains better performance.Unlike most of the natural language processing tasks, this article shows that relation extraction in Chinese outperforms English in the news domain. The main reason is that Chinese is more concise and clearer than English in local semantic expression. In addition,owing to the redundancy and complementary between Chinese and English, no matter whether bilingual co-training or bilingual active learning, both can improve the performance of Chinese and English relation extraction.
Keywords/Search Tags:Parallel Corpus, Relation Extraction between Named Entities, Bilingual Co-Training, Bilingual Active Learning
PDF Full Text Request
Related items