Font Size: a A A

Research On Key Technologies For Entity Relation Extraction

Posted on:2016-09-13Degree:MasterType:Thesis
Country:ChinaCandidate:S Y LiuFull Text:PDF
GTID:2308330482479176Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the rapid development of Web2.0, network information increases quickly. How to extract information in which users are interested from massive network information has become an important problem to be solved. As a core task and an important part of information extraction, entity relation extraction realizes the identification of semantic relation between entity pairs, and plays an important role to sentence semantic understanding and the construction of entity semantic knowledge base. This dissertation makes research mainly on the technology of entity relation extraction, including supervised entity relation extraction, the construction of entity relation trigger word dictionary, and open relation extraction of Chinese language. Some major contributions are listed as follows:(1)The supervised technology of entity relation extraction is studied. The boundaries of the fuzzy relation samples of common noun entity pairs are difficult to determine and the problem of overlapping exists between the samples. A supervised relation extraction method based on SVM-KNN classifier(the combination of SVM and KNN classifier) is proposed. This method designs a mechanism based on double-vote to determine fuzzy relation samples by using SVM classifier. After double-voted by SVM classifier, the test sample set is divided into determinate region and fuzzy region. The SVM classifier outputs the results of the relation samples in the determinate region while the KNN classifier is used to classify the samples in fuzzy religion for the second classification. The experimental results show that this method can determine the boundaries of the fuzzy samples effectively and improve the performance of the entity relation extraction greatly.(2)The technology of constructing entity relation trigger word dictionary automatically is studied. The traditional construction of entity relation trigger word dictionary by manual annotation or supervised methods is labor-intensive and it is difficult to make the dictionary complete. An unsupervised method of constructing relation trigger dictionary automatically is proposed. This method first models HDP for the relation instance set and get it’s topic-word distribution, then get the candidate trigger word dictionary through filtering topics and words weighted by probability. Finally, the dependency parsing is used to filter noise words from candidate trigger words dictionary to form the final relation trigger word dictionary. This method does not require manual intervention and avoids the initial relation trigger word thesaurus which supervised methods need. The experimental results show that this method can build relation trigger word dictionary of any type quickly and has high accuracies.(3)Open relation extraction is studied. The assumption of Distant Supervision introduces large amount of noise label data which doesn’t express the specified relation type. An open entity relation extraction method of Chinese language based on topic model is proposed. This method first define a new relation pattern called trigger word window pattern and extract every sentence sample’s pattern from the candidate relation sentence sample set by clustering them. Then the topic model is used to realize the identification of noise label data by calculating the probability of relation pattern group expressing the corresponding relation type. Finally, an entity-relation extraction model is trained to realize relation extraction. Experimental results show that the proposed method can identify the noise label data in the training corpus effectively, and the performance of the entity relation extraction model is improved by filtering those noise data.
Keywords/Search Tags:Entity Relation Extraction, Support Vector Machine, Relation Trigger Word Dictionary, Hierarchical Dirichlet Processes, Dependency Parsing, Distant Supervision, Topic Model, Noise Mark Identification
PDF Full Text Request
Related items