Font Size: a A A

Research And Application Of Relation Extraction Based On Distant Supervision

Posted on:2022-05-18Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2518306326450094Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
In recent years,due to the exponential growth of the amount of information on the Internet,it has become difficult for people to get the information they need quickly and accurately.For this reason,information extraction technology should be born.Information extraction technology parses unstructured text and transforms it into structured data that is easy for computers to understand and process,so as to provide users with faster and more accurate information services.The main research contents of information extraction technology include three aspects: entity extraction,relation extraction and event extraction.Among them,relation extraction,as the core task of information extraction technology,refers to the extraction of semantic relations between two or more entities from unstructured text data and the formation of triple data.It has a wide range of applications in many fields such as knowledge graph construction,intelligent question answering,semantic search and so on.Relation extraction based on distant supervision automatically generates training data by the way of knowledge base and large-scale text alignment so as to reduce the workload of manual annotation data.However,because the distant supervision thought is too strong,the training data produced has the problem of mislabeling.In order to alleviate this problem,the main research work of this paper is as follows:(1)A distant supervised relation extraction model based on residual attention is proposed.The trunk branch extracted sentence features,while the mask branch generated attention features.The residual network structure was used to combine the two features,so as to enhance the weight of key information and ensure the transmission of semantic information,so as to improve the performance of relation extraction.At the same time,the sentence-level attention mechanism is introduced on multiple instances to further reduce the influence of mislabeling on relation extraction.Experimental results show that,compared with the current representative methods,the proposed model has a certain improvement in relation extraction performance,with an average accuracy of 79.8%.Compared with the PCNN?ATT and BGRU+3ATT models,which also use the sentence-level attention mechanism,the proposed model has an improvement of 2%-4% and is 5% higher than the Res CNN?9 model with residual structure.(2)A distant supervised relation extraction model combining residual attention and self-learning is proposed.In this model,the idea of self-learning is introduced into the distant supervised relation extraction model based on the residual attention,and the corresponding correction labels are constructed based on the structure or characteristics of the data itself and the residual attention mechanism is combined to obtain better performance of relation extraction.Experiments show that the average accuracy of this model reaches 82.4%,which is about 5% to 12% higher than that of PCNN-One + SoftLabel,PCNN-ATT+ Soft-Label,and GPCNNs models which also use correction labels to improve performance,and about 3% higher than that of distant supervised relation extraction model based on residual attention mechanism.(3)The distant supervised relation extraction model combining the residual attention and self-learning is applied to the construction process of the curriculum knowledge graph.Taking the course "Data Structure" of computer science as an example,the knowledge points in the course are presented in the form of a clearer and more intuitive graph to help users sort out the course context and efficiently understand the knowledge points.
Keywords/Search Tags:Relation extraction, Distant supervision, Residual attention, Self-learning strategy, Knowledge graph
PDF Full Text Request
Related items