Font Size: a A A

The Research Of Binary Personal Relation Extraction On Web2.0

Posted on:2017-03-27Degree:MasterType:Thesis
Country:ChinaCandidate:L XuFull Text:PDF
GTID:2308330509950226Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of computers, more and more information appears on the Internet. But it has become a difficult problem in the computer field to search our useful information automatically. In order to resolve this problem, information extraction technology emerges. Because of the broad application prospects, many researchers pay more attention to the extraction of Personal Entity Relation which drawn as an important branch of information extraction. For traditional process of relation extraction, it has many problems like that it has many words to describe the same relationship, extraction template quality is not high, and there are a large amount of calculation to analysis Personal Entity Relation. To deal with these problems, this paper presents a new method of Web2.0 which combined semi-supervised learning features of machine learning and the Information Gain features of information theory for extracting relationship between two people according to previous research results on entity relation extraction.In response to these problems, this paper proposes the following improvement proposals:Firstly, for Chinese statements "multi-word synonymous" phenomenon, this paper presents an extension method of description of relation which based on Crowd-sourcing. Given portion of the particular description of relation artificially, use "How Net" and "word synonym forest" to expand them firstly, and then distributed the collection of the expanded to the public network, so that fans of the language made the second extension, finally, made the similarity calculation and filter part of synonyms into a repository for us to analysis.Secondly, this paper put forward an algorithm of relation extraction template combined with semi-supervised learning and Information Gain. In this paper, in order to resolve the shortage of time-consuming template created artificially, we combined with a semi-supervised learning into the process of template been created. First of all, set up part of manual sample labels, loop iteration continuously in relation extraction process to produce more relation extraction templates. For the feature of each word in the sentence which carries a different amount of information due to the different location, this paper uses the value of Information Gain to determine the window value of template in the context.Thirdly, for the phenomenon of one sentence containing a plurality of personal entities, this paper proposes a screening method based on template matching. This method judging relative position between a pair of entities in the template and description of relation,and then screening entities as candidate entities which containing the relative position information in the sentence.Finally, for the invalid calculation of 0*0=0 in the text similarity calculation of Vector Space Model, this paper put forward a verification method of candidate entities which bases on the non-zero weighting optimization. This method can optimize the dimension of feature weight matrix, and we can do the non-zero weighting judgment before the similarity calculation thereby reducing the amount of calculation.
Keywords/Search Tags:Personal Entity, Relation Extraction, Information Gain, Machine Learning
PDF Full Text Request
Related items