Font Size: a A A

Automatically Extracting Semantic Relations From Wikipedia Text

Posted on:2009-12-24Degree:MasterType:Thesis
Country:ChinaCandidate:G WangFull Text:PDF
GTID:2178360242976780Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The Semantic Web builds on not only ontologies but also the contents conforming to the ontologies. Although Semantic Web data is growing steadily on the Web, the space of instances is sparsely populated. Semantic relations are important part of semantic knowledge bases. Consequently, the extraction of semantic relations is of great importance for the realization of the Semantic Web.Wikipedia is a free online encyclopedia and is now considered one of the largest online knowledge repositories. It has both broad coverage and high accuracy in its contents, which makes a lot of sense to extract semantic relations from Wikipedia for knowledge base construction. The major portion of information in Wikipedia presents in the form of free text. Although there is much well-structured information in Wikipedia which directly makes an effective data source for relation extraction, most semantic relations lie in the sea of its free text.However, problems must be solved when extracting semantic relations from Wikipedia text. On the one hand, we need to figure out an effective way to recognize fined-grained semantic entities in Wikipedia. On the other hand, we need to come to an effective solution to relation extraction using only a few relation examples. In this paper, firstly, we propose to enhance relation extraction by leveraging the structured information from Wikipedia. Inspired by the concept of Selectional Constraints, we innovatively propose a method for generating Selectional Constraints Features from structured data within Wikipedia. The features are employed for entity recognition and validation and greatly improved the performance of a pattern matching-based extraction method, as shown by the experimental results. Secondly, considering that only a small amount of relation examples without corresponding negative examples can be obtained from Wikipedia infoboxes and a relationship taxonomy is not at hand either, we creatively applied a state-of-the-art positive-only machine learning algorithm to the relation extraction task (To the best of our knowledge, no work has been done on using positive-only learning algorithms for relation extraction). We extended the positive-only learning algorithm by transforming it to a transductive one and building a self-training algorithm on top of it in order to work well with sparse training data. From the experiments, we found that the traditional multi-class classification was inappropriate for the task. The experimental results indicate that positive-only learning outperforms traditional binary classification algorithm in which we manually provided some negative data. Given under-sampled positive training examples, the self-trained algorithm significantly improved the overall performance by enhancing recall while sacrificing not too much precision.
Keywords/Search Tags:Relation Extraction, Wikipedia, Semantic Web, Positive-Only Learning
PDF Full Text Request
Related items