Font Size: a A A

The Study Of Entity Relation Extraction Algorithm

Posted on:2016-04-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:C Y ZhangFull Text:PDF
GTID:1108330482457875Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Entity relation extraction is one of the subtasks of information extraction. It is the problem of extracting entities and relations among them from natu-ral language documents, which means the technology of making most non-structure data to semi-structured data or structured data. With the rising pop-ularity of the Internet, the data of Internet expands sharply. Huge amounts of unlabeled data contains a large number of named entities and relations among them, such as person entity, organization entity, and relations between them. How to extract these information effectively and precisely becomes the big challenge of entity relation extraction.In decades, many related tasks, e.g., Message Understanding Conference (MUC), Automatic Content Extraction (ACE), and Knowledge Base Population (KBP), arose and facilitated the development of relation extraction technology. The English Slot Filling task (ESF) of KBP track at Text Analysis Conference (TAC), which involves mining information about entities from text, has been very attractive for its potential application in dealing with big data in the web. ESF systems determine from a large source collection of documents the values of specified attributes, and entities investigated in ESF are generally either a person or some type of organization. The entities and the relation attributes extracted by ESF systems are used to constructing the info-box of Wikipedia and reference knowledge base.This paper studied key problems of entity relation extraction. The object relations of this paper are 25 relation attributes of person entity and 16 rela-tion attributes of organization entity, which are defined by the ESF task. We mainly utilized a semi-supervised method, bootstrapping method, to extract re-lation attributes due to the current part-labeled corpus. We constructed a robust semantic bootstrapping model starting with entities and their relation attributes from former ESF tasks.The main research contents and major innovations are as following:1. This paper studied trigger word feature in semantic constraint of rela-tions extraction. We proposed an activation force based trigger word mining method. This method defines a new metric to measure the ability of a trigger word triggering a relation, which is called Trigger Force (TF). We applied the TF method to ESF task to extract trigger words of these defined relations. The experimental results show that it has a good performance.2. This paper studied pattern representation methods of relation extrac-tion, and proposed a new relation pattern:semantic shortest dependency pat-tern (SSDP). The SSDP utilizes the shortest dependency path from an entity to its relation attribute value with a trigger word as the semantic anchor. Com-pared with traditional relation patterns, the SSDP contains much more syntactic features and semantic features, which means that it is more relation oriented.3. This paper studied similarity methods of patterns in bootstrapping mod-els, and proposed a new bottom-up kernel (BUK) method to compare relation patterns. The BUK method assumes that the near the dependency to the relation attribute value, the more important the dependency is. The BUK compares two patterns by weighting dependency similarities from the relation attribute value nodes up to the root nodes following structures of their SSDPs.4. This paper studied the construction of semantic bootstrapping model. We summarized the general components of traditional bootstrapping model for relation extraction, and defined a more robust semantic bootstrapping model. This paper stated the construction of the semantic bootstrapping model, and described how to add semantic constraint into traditional bootstrapping model to suppress the semantic drift.In the end, we summarize the whole work of this paper and prospect the future work.
Keywords/Search Tags:relation extraction, trigger word, pattern learning, boot- strapping, kernel method
PDF Full Text Request
Related items