Font Size: a A A

SVM And TSVM Based Chinese Entity Relation Extraction

Posted on:2008-06-25Degree:MasterType:Thesis
Country:ChinaCandidate:F XuFull Text:PDF
GTID:2178360242999041Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Information Extraction Technology automatically transforms unstructured texts into structured ones, which not only forms a system to satisfy the strong request, but also affords a basis for other applications such as Information Retrieval, Text Category, Question Answering. Entity Relation Extraction is so important in Information Extraction that it receives more and more interest from researchers. The task of Chinese entity relation extraction still needs much further study, calling for a mass of work.This paper presents the work of Chinese entity relation extraction. We have designed the context vector by using several new features including word, part of speech tag, entity and mention, overlap and HowNet concepts. Based on the context information, we apply an SVM classifier to detect and classify the relations between entities. We take the training data of ACE 2004 as our experimental data and have obtained encouraging results. The experimental results are analyzed in detail, which helps us investigate the impact of various features and training example quantities on the extraction performance. The experimental results indicate: it would be advisable to choose different features for different extraction task. The word features are suitable for relation detection task, while Hownet concept features are appropriate for relation type and subtype characterization tasks. Word features is a basic one and overlap features contribute most. The performance will rise with the increasement of training examples, so it will be necessary to develop large corpus if you want to use SVM classifier. But after the amount of corpus achieves certain level, the gain from adding more training examples is so trivial that we must find other way to enhance extraction performance, developing more features for instance.Aiming at the dependence of SVM method on large scale corpus, we propose the introduction of semi-supervised learning method TSVM to relation extraction. to see whether it can improve the extraction performance by using both labeled and unlabeled datum. Results from experiments show that: TSVM performs much better than SVM in the same context when labeled examples are very few, while SVM performs little better than TSVM when there are many labeled examples. TSVM can perform well on relation detection task, which makes it practicable on this kind of task. But on the task of relations type recognition, TSVM perfoms not very good, forcing us to look for other semi-superisved learning methods. An multi-TSVM classifier is also constructed.Future works include developing more features such as chunking information, Hownet concept structure to improve the extraction performance, choosing parameters for the classifier and invesigating the rule of example quantities needed by SVM and TSVM.
Keywords/Search Tags:information extraction, entity relation extraction, SVM, TSVM, feature selection, training example quantities, multi-TSVM
PDF Full Text Request
Related items