Font Size: a A A

Entity Relation Mining Method In Historical Knowledge Graph

Posted on:2020-06-06Degree:MasterType:Thesis
Country:ChinaCandidate:F ZhangFull Text:PDF
GTID:2428330611498604Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the continuous development of the Internet,the amount of data in the Internet is also constantly increasing.However,most of the data is stored in the form of text.How to effectively extract the data from the text is a very important issue.Entity relationship extraction,as a key component of information extraction,structuring unstructured natural language texts is the basis of natural language applications such as question and answer systems and knowledge graph However,the traditional relationship extraction method requires manual data annotation,feature selection and relationship type definition before training,which requires the assistance of experts in the professional field.This consumes a lot of manpower and time,so how to obtain the entity relationship with less cost becomes particularly important.In order to solve the above problems,this topic uses distant supervision,deep learning,natural language processing and other techniques to design two algorithms for entity relationship mining in the historical field.In this paper,baidu encyclopedia,wikipedia,textbook and general knowledge graph are collected as historical data in the research of entity relationship mining method.In historical studies,there is no public data set with high coverage of relationship types,and manual predefined relationship types may be biased and incomplete.Aiming at this problem,this paper proposes an entity relationship extraction method based on rule matching to extract the relationship indicators in unstructured text,which avoids the need of manually predefined relationship types.At the same time,the special syntax processing of the historical text and the Logistic regression model are added to the model to improve the extraction accuracy of relational triples.In view of the high cost of manually annotated data,the distant supervision method is used to automatically annotate the training data but the distant supervision also brings the problem of intra-sentence noise and labeling errors.In order to solve these two problems,this paper proposes a fusion relationship extraction model based on shortest dependent path(SDP),Bi GRU and APCNNs.Among them,the intra-sentence noise is filtered by the shortest dependent path SDP,which reduces the sentence length and effectively solves the problem of intra-sentence noise.After the addition of APCNNs,the attention mechanism based on sentence level and the method of segmenting maximum pooling were used to weaken the influence of wrong labeling on relationship extraction.At the same time,Bi GRU is added to the vector representation stage of the model,and the context information of the words is learned,which adds more features to the model training and improves the accuracy of the model.Experiments show that the fusion relationship extraction model based on SDP,Bi GRU and APCNNs has achieved good results in the historical training corpus constructed by distant supervision.
Keywords/Search Tags:relation extraction, distant supervision, deep learning, Rule Matching
PDF Full Text Request
Related items