Automatically Extracting Semantic Relations From Wikipedia Text

Posted on:2009-12-24

Degree:Master

Type:Thesis

Country:China

Candidate:G Wang

Full Text:PDF

GTID:2178360242976780

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

The Semantic Web builds on not only ontologies but also the contents conforming to the ontologies. Although Semantic Web data is growing steadily on the Web, the space of instances is sparsely populated. Semantic relations are important part of semantic knowledge bases. Consequently, the extraction of semantic relations is of great importance for the realization of the Semantic Web.Wikipedia is a free online encyclopedia and is now considered one of the largest online knowledge repositories. It has both broad coverage and high accuracy in its contents, which makes a lot of sense to extract semantic relations from Wikipedia for knowledge base construction. The major portion of information in Wikipedia presents in the form of free text. Although there is much well-structured information in Wikipedia which directly makes an effective data source for relation extraction, most semantic relations lie in the sea of its free text.However, problems must be solved when extracting semantic relations from Wikipedia text. On the one hand, we need to figure out an effective way to recognize fined-grained semantic entities in Wikipedia. On the other hand, we need to come to an effective solution to relation extraction using only a few relation examples. In this paper, firstly, we propose to enhance relation extraction by leveraging the structured information from Wikipedia. Inspired by the concept of Selectional Constraints, we innovatively propose a method for generating Selectional Constraints Features from structured data within Wikipedia. The features are employed for entity recognition and validation and greatly improved the performance of a pattern matching-based extraction method, as shown by the experimental results. Secondly, considering that only a small amount of relation examples without corresponding negative examples can be obtained from Wikipedia infoboxes and a relationship taxonomy is not at hand either, we creatively applied a state-of-the-art positive-only machine learning algorithm to the relation extraction task (To the best of our knowledge, no work has been done on using positive-only learning algorithms for relation extraction). We extended the positive-only learning algorithm by transforming it to a transductive one and building a self-training algorithm on top of it in order to work well with sparse training data. From the experiments, we found that the traditional multi-class classification was inappropriate for the task. The experimental results indicate that positive-only learning outperforms traditional binary classification algorithm in which we manually provided some negative data. Given under-sampled positive training examples, the self-trained algorithm significantly improved the overall performance by enhancing recall while sacrificing not too much precision.

Keywords/Search Tags:

Relation Extraction, Wikipedia, Semantic Web, Positive-Only Learning

PDF Full Text Request

Related items

1	Research On Personal Relation Extraction Based On Wikipedia
2	Mining Semantic Knowledge From Chinese Wikipedia
3	Research On Semantic Relation Extraction Between Named Entities
4	Research On Entity Relation Extraction Algorithm Based On Semi-supervised Machine Learning
5	Research And Implementation On The Method Of Chinese Domain Concept And Relation Extraction Based On Semantic Graph
6	Relation Extraction Of Chinese Named Entities Based On Location And Semantic Features
7	Upper And Lower Semantic Extraction Based On Hybrid Kernel Method
8	Research On Semi-Supervised Semantic Relation Extraction Between Named Entities
9	Based On A Summary Of The Semantic Relation Extraction
10	Chinese Entity Relation Extraction Base On Syntactic And Semantic Analysis