Font Size: a A A

The Research Of Relation Extraction With Unsupervised Method

Posted on:2008-07-25Degree:MasterType:Thesis
Country:ChinaCandidate:Z T ZhangFull Text:PDF
GTID:2189360245998148Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Entity relation extraction is one of the most important topics in information extraction. And it is also important to Text Comprehension (TC), Information Retrieval (IR), Question Answer (QA), and Machine Translation (MT). Along with the coming of information era, relation extraction has been a hot research field.Since the concept of relation extraction was introduced, there has been considerable work on supervised learning of relation patterns, using corpora which have been annotated to indicate the information to be extracted. But it is usually very hard to annotate so many corpuses. Due to the limitation of supervised methods, some weakly supervised and unsupervised approaches have been suggested. Although these methods can resolve some of the deficiency, but they still have some limitations.We put forward an approach to entity relation extraction from large text corpora. Our method is based on the hypothesis that pairs of entities occurring in similar context can be clustered and that each pair in a cluster is an instance of the same relation. Relation extraction is a process that clustering pairs of named entities according to the similarity of context words intervening between the named entities. The improvement of the efficiency of the algorithm is mainly reflected from three aspects: firstly, we introduce an classic model in text processing called Vector Space Model. Vector Space Model is used to extract feature words from context to construct feature vector, and then assign every feature an appropriate weight according to its contribution to cluster; secondly, for the sake of extraction relation from the feature vector, we propose an optimized cluster algorithm that not only improve the precision of the initial algorithm but also does not make the efficiency drop obviously; finally, we apply a discriminative category matching method to label the relation type.To validate the feasibility and the effectiveness about our entity relation extraction method, we construct three subsets from ACE corpus for domains EMP-ORG (Person Organization), GPE-AFF (Geo-Political Entity Affiliation) and PHYS (Physical) respectively, and our experiments on this corpus show that the results of our method are improved. And we also compare our result to Hasgawa's; the precision and the efficiency of our method are both superior to Hasgawa's. The experiments verify that the method proposed in this paper is feasible and effective.
Keywords/Search Tags:relation extraction, feature extraction, Vector Space Model, cluster
PDF Full Text Request
Related items