Font Size: a A A

Research On Extraction Technology Of Relation Between Enterprise Entities Based On Machine Learning

Posted on:2010-10-03Degree:MasterType:Thesis
Country:ChinaCandidate:L WangFull Text:PDF
GTID:2248330395457543Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Relation Extraction between Enterprise Entities is one of entities extraction, and it’s a typical Information Extraction problem. Fueled by MUC and ACE Evaluations, the research on this subject has made great progress, and the researchers provided many effective methods to solve the problem. Among these methods, the solution based on Machine Learning comes into outstanding performance, which turns the Relation Extraction into Classification on the premise of definition of relation types. The method based on Feature Vector is one of these methods, which builds Feature Vector by words, part of speech, and type of entities etc, where exist in the context of entities pair in the sentence, build Vector Space Model and then use the classifier to recognize the relation type. In this paper, we use this method as our first solution. Another method based on supervised learning is to use Kernel Feature, which is to shallow syntactic parse the context in which entities appear, and construct a kernel fuction to calculate the similarity between the structured object, such as a syntax tree, and this method also shows a good performance.In this passage, first we define six typical types of relation between enterprises according to the charactistic of this kind of relation, construct lists of keywords for each relation, and crawl the web to get a large scale of data set. Through the pre-proceeding, we mark a small scale of instance set, and get a testing data set generating randomly. The first method in this paper is to use the surface feature build a Enterprise Entities Relation Extraction System, with the marked dataset as training corpus. The features which we use include the words of four kind of parts of word in the window in front of and behind the entities, and we choose the SVM and kNN as the classifier.Most of existed methods need a large scale of marked corpus, and get the extraction results through supervised learning. However, in most of cases in reality, we are in lack of marked corpus. For this reason, the paper proposes a semi-supervised Enterprise Entities Relation Extraction System based on pattern. We make use of a effective menchanism of learning and evaluating pattern, and s method of matching and evaluating instances, so we can expand the set of confidence. After many times of alterations, we can get pattern set of high qulaty, and then use them on the testing instance set. The experiment proves our method can get high precision ratio.
Keywords/Search Tags:Entity Relation Extraction, Machine Learning, Feature Vector, Semi-supervisedLearning, bootstrapping Framework
PDF Full Text Request
Related items