Font Size: a A A

Research On Personal Relation Extraction Based On Wikipedia

Posted on:2017-05-25Degree:MasterType:Thesis
Country:ChinaCandidate:B J LiuFull Text:PDF
GTID:2308330485957906Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the study of information extraction, the personal relation extraction is an important research topic. This research rose in the MUC conference evaluation project and was substituted by the ACE conference later. Most study of the Chinese person entity relation extraction use the corpus provided by ACE conference or the People Daily’s news corpus which is structured corpus. However, in the practical using, especially in the Internet era, people are getting used to searching figures, events etc through Internet and search engines. Wikipedia is an open knowledge base containing much person entities. Also, it is a semi-structured knowledge base which matches the features of web text. Thus the personal relation extraction based on Wikipedia is more like the personal relation extraction in real life.The main idea of personal relation extraction is to convert it to personal relation classification problem. Traditionally methods are based on knowledge base, pattern matching and machine learning, which divided into kernel-basing method and feature vector-basing method. Meanwhile. There are two major difficulties in personal relation extraction, respectively person entity recognition and personal relation recognition This paper brings up corresponding solutions in allusion to the difficulties above, innovation points are as follows:(1) In order to solve the problem of the low recognition rate of the foreign transliteration names in the name recognition of the existing word segmentation tools, our paper constructs person entity library based on Chinese Wikipedia by the method of extracting the infobox data in Wikipedia; at the same time, we construct the foreign transliteration name dictionary based on the Wikipedia Link data.(2) In this paper, we propose a hierarchical classification method for person entity relation classification. In order to improve the speed and performance of the classification model, we combine the pattern matching and feature vector method, and we use DAG-SVMs’multi value classification method to solve the multi value classification problem in the person entity classification. At the same time, we introduce the self-relation in the division of person entity relations, to alleviate the phenomenon of ’same person with different names’in Wikipedia. The feasibility of this method is verified by experiments.This paper proposed a method to construct objective scales of personal entity library and names dictionary based on Wikipedia. Meanwhile, after experimental verification, our approach performed better in personal relation recognition, especially in self-relation and family relations’related sub categories.
Keywords/Search Tags:Wikipedia, Relation Extraction, Support Vector Machine, Hierarchical Classification
PDF Full Text Request
Related items