Font Size: a A A

Research On Bilingual Event Extraction

Posted on:2017-04-05Degree:MasterType:Thesis
Country:ChinaCandidate:Z ZhuFull Text:PDF
GTID:2308330488461988Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Subjective texts on the Internet are undergoing a rather rapid expansion with the development of Internet. How to deal with the huge amounts of information automatically and intelligently to mine the valuable and competitive information becomes an important research issue, which makes the appearing of the study of information extraction. In information extraction, event extraction is a basic and challenging task.Existing studies mainly focus on the supervised learning methods. However, methods on supervised learning need a large-scale of labeled data, which may consume large amounts of resources. Meanwhile, event extraction generally suffers from the data sparseness problem due to various kinds of event categories. These problems are existing in event extraction tasks in a variety of languages. In this paper, we mainly focus on bilingual event extraction combing English and Chinese event extraction tasks and resources. In details, our study includes the following three aspects:First, this paper proposes a novel approach to bilingual event extraction with feature augmentation. The main idea is to combine the English and Chinese event texts effectively and get the bilingual feature text through feature augmentation to make the classification decision on both English and Chinese events. This method can avoid the sparseness influence by expanding the scale of training set and enriching the training information. Experimental studies demonstrate that our proposed approach significantly outperforms traditional monolingual event extraction.Second, this paper proposes a cross-lingual event extraction method using integer liner programming. The main idea is to employ English event resource that is better and more abundant to help Chinese event extraction, plus results optimization through integer liner programming. Experimental studies demonstrate that our approach achieves much better performances on Chinese event extraction especially using the bilingual training set which consists of English corpus and the translation one.Third, we propose a novel bilingual event extraction based on active learning. The main idea is to train the model using English corpus, together with some good Chinese unlabeled samples which are actively selected for human annotation and automatic annotation. It is indicated that the method can effectively cut down the cost of manual annotation, enrich the information of training set and achieve a more satisfying performance.
Keywords/Search Tags:Event Classification, Bilingual Information, Feature Augmentation, Integer Liner Programming, Active Learning
PDF Full Text Request
Related items