Font Size: a A A

Research And Implementation Of Domain Knowledge Graph

Posted on:2022-05-23Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y WangFull Text:PDF
GTID:2518306524493944Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Knowledge Graph describes concepts,entities and their relationships in the objective world in a structured way,and provides an effective method for organizing and managing massive data.In the current era of big data,every industry is producing a lot of data all the time,and most industries need to build their own domain knowledge Graph.However,the construction of domain knowledge Graph is often faced with the difficulties of lack of domain related data and heavy dependence on manual annotation.Therefore,how to create new annotated data automatically and reduce the workload of manual work under the condition of limited domain data is the key problem to be solved urgently in the current domain knowledge Graph construction.This thesis focuses on the data processing,knowledge extraction,knowledge fusion,knowledge storage and other main links in the construction of domain knowledge Graph.The thesis focuses on data processing and knowledge extraction especially,and proposes new methods and solutions to the shortcomings of existing technologies.The main contributions of this thesis are as follows:1.In the process of data processing,aiming at the problem of huge manual workload in the process of domain text filtering,a SVM classification method based on local sensitive hash is proposed to realize domain text filtering automatically.In this method,the local sensitive hash algorithm is used to map the original training samples.Then the possible boundary sample points are selected as SVM training samples,which significantly reduces the number of SVM training samples and improves the speed of model construction.At the same time,the grid search algorithm is used to select the parameters of SVM in order to improve the accuracy of text classification.The performance of the proposed method is analyzed on the open data set UCI adult.And the effect is verified on the text classification data set of Sogou laboratory.2.In the process of knowledge extraction,aiming at the lack of annotated corpus in model training,this thesis proposes an automatic generation method of annotated corpus based on Seq2 Seq model,which is used to expand the four tuple annotated corpus such as "entity1-relation-entity2-sentence".In the method,a large number of unlabeled texts are retranslated based on reserved words.And corresponding synonymous sentences are generated as training corpus.In the model training stage,entity label replacement method is used to make the model learn the synonymous sentence generation rules independent of entity.In the text generation stage,the generation strategy based on self checking bundle search is used to ensure that the output of the model contains the specified entity and relationship.The effectiveness of the method is verified by using the generated annotated corpus as training corpus in relation extraction task.3.According to the above research,this thesis designs a framework for building domain knowledge graph.In the field of finance,the thesis designs and implements a graph construction scheme from data processing,knowledge extraction,knowledge fusion to knowledge storage,and shows the basic functions of financial domain knowledge graph.
Keywords/Search Tags:Knowledge Graph, SVM, Corpus Annotation, Text Generation
PDF Full Text Request
Related items