Font Size: a A A

Distant Supervised Relation Extraction With Clustering Based Denoising And Type Constraints

Posted on:2014-02-12Degree:MasterType:Thesis
Country:ChinaCandidate:Z L ZhuFull Text:PDF
GTID:2248330395498227Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, the amount of content on the Internet isbecoming more and more, which contains a lot of useful information. Information extractionas a tool for extracting structured information from the content becomes increasinglyimportant, and it has demonstrated its value in the field of vertical search and the SemanticWeb. After nearly20years of development, relation extraction from a large-scale text corpusis becoming a hot topic in the information extraction field. Supervised method for relationextraction has shown its effectiveness, however, the supervised method requires manuallylabeled training data, which is time-consuming and labor-intensive.Therefore, researchers have proposed a method called Distant Supervision, which uses aknowledge base and a text corpus, and for each pair of entities that appears in some relationinstance of knowledge base, find all sentences that contain those entities in a large unlabeledcorpus and extract textual features to train a relation classifier, thus avoiding the hand-labeleddata. The distant supervision assumption is that if two entities participate in a relation, anysentence that contains those two entities might express that relation.This paper argues that1) Distant Supervision method assumption is not always true, andcan lead to noise data, thus hurt precision of relation extractor.2) Distant Supervision methodonly uses the features extracted from the sentences that express a relation. We propose anapproach by adding entity’s information as extra type constraint into the Distant Supervisionbased framework to train a relation extractor. Our works of this paper are as follows:(1) We propose a clustering based denoising method for aligned data of distantsupervision. We treat the sentence that is included in the sentences of aligned data, but notexpresses the corresponding relationship, as noisy data. We first clustering sentences andfinding common pattern of sentences that express the corresponding relationship, then selectthe good sentences, so we can reduce noisy data.(2) Propose two ways to add type features. We first provide a way to align knowledgebase entities to their text mentions, and extract each entity’s information (features) from thosetext mentions. Secondly, we explore two ways of adding entity’s information to the problemof relation extraction:1) a joint algorithm, which models entity’s features and relation’sfeatures jointly,2) a type constraint algorithm, which uses such features to constrain the typesof relation arguments.The experimental results show that the two methods presented in this paper can effectively improve the accuracy of relation extraction system. However, type constraintalgorithm will to some extent reduce the recall of relation extraction system. Due to thelarge-scale and redundancy of the Internet content, we think accuracy is important than recallat the beginning of research.
Keywords/Search Tags:Semantic Web, Relation Extraction, Distant Supervision, Denoising
PDF Full Text Request
Related items