Font Size: a A A

Research On Chinese Open Entity Relation Extraction

Posted on:2014-11-07Degree:MasterType:Thesis
Country:ChinaCandidate:A A LiuFull Text:PDF
GTID:2268330422951688Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Entity relationship is an important way to describe the semantic relationshipsbetween entities. As one of the most important subtask of information extraction, entityrelation extraction has wide application prospects. With the rapid development of theWeb2.0, people put forward new requirements on the entity relation extraction toaccommodate quickly and accurately obtaining valuable information on the rapidgrowth of massive web text for user.Traditionally, Entity Relation Extraction (RE) methods required a pre-defined set ofrelation types. But it’s difficult to build a well-defined architecture of the relation types.Open Entity Relation Extraction (ORE) is the task of extracting relation triples fromnatural language text without pre-defined relation types. We propose two ORE methodsto solve relation extraction on different application scenarios, and explore solutions toautomatically build relation types.This paper presents a supervised method to solve sentence-level ORE problem. Thedetailed criterion of annotation is established and a corpus which contains1000documents is annotated. By analyzing the linguistic phenomenon of the corpus, wedesign a domain-independent program to extract features. The average F-measureachieves61.64%on the corpus.This paper presents UnCORE (Unsupervised Chinese Open Entity RelationExtraction for the Web), an unsupervised ORE method which is to discover relationtriples from large-scale web text. UnCORE exploits word distance and entity distanceconstraints to generate candidate relation triples, and then adopts global ranking anddomain ranking methods to discover relation words from the relation triple candidate.Finally UnCORE filters them by using the extracted relation words and some sentencerules. Results show that UnCORE extracts large scale relation triples at precision higherthan80%.This paper proposes the relation-words-clustering-based method to build therelation types. First, we calculate the similarity between relation words based onRNN-LM or HowNet, and then cluster the relation words by AP or HAC. Finally, webuild a well-defined relation types. At last, we design and implement a demonstration platform for users to extractrelation triples from sentence and to search relation triple.
Keywords/Search Tags:Entity Relation Extraction, Relation Triple, Relation Word, RelationTypes
PDF Full Text Request
Related items