Research On Semantic Based Knowledge Graph Cleaning And Optimization Technology

Posted on:2022-04-12

Degree:Master

Type:Thesis

Country:China

Candidate:S L Gao

Full Text:PDF

GTID:2518306572960159

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the advent of the Internet era,informatization has become the trend of today’s society.With the wide application of computer technology in industry and people’s daily life,massive data and knowledge are produced.Massive data often contains rich information,which needs to be explored and analyzed.It provides more powerful support means and theoretical basis for intelligent Q & A system,assistant decision-making,recommendation system in the intelligent era.Therefore,it has become the common goal of people in the era of big data to extract effective information from massive data and summarize it into knowledge that can help all walks of life.With the development and innovation of artificial intelligence technology and machine learning technology,a large number of methods of analyzing,mining and processing massive data are born.Through the application of statistical learning method and deep learning method,it has become the current trend to summarize and discover the rules in massive data.However,these methods generally pay more attention to the discovery of rules in data,and lack of semantic information mining in data.After Google put forward the concept of knowledge graph in 2012,there have been many common domain knowledge graph appeared,such as freebase,YAGO,DBpedia and so on.Through the structure of semantic network,the facts in the real world can be expressed in the form that the computer can understand.In addition to the general domain knowledge graph,all walks of life are committed to creating their own domain of professional knowledge graph,also known as the vertical domain knowledge graph.In the life cycle of knowledge graph,in addition to the construction and reasoning calculation of knowledge graph,two important stages are the cleaning and updating of knowledge graph.For the knowledge graph providing query service,two important indicators to measure its effect are query time and accuracy.However,in the process of creating knowledge graph,there will inevitably be redundant and wrong knowledge,which will affect the accuracy of knowledge graph query.In addition,in practical application,many enterprises do not pay enough attention to the cleaning of knowledge graph,resulting in a large number of accumulation of knowledge,which will not only cause poor query effect,but also bring great pressure to the maintenance of knowledge graph.Based on the above problems,this paper studies and verifies the knowledge graph cleaning and updating in the following three aspectsFirst,sub graph extraction of knowledge graph.In order to make more effective use of information and knowledge in massive data,provide more accurate and efficient query services for users of query system,and then assist users in decisionmaking,system recommendation and other functions and services,this paper proposes a query oriented sub graph extraction technology.Using the proposed node retention probability model,combined with the user’s query interest and the type of entity nodes in the knowledge graph,the comprehensive evaluation is carried out,and the sub graph nodes are screened and retained.Experiments on Freebase subset show that the model can greatly reduce the number of irrelevant nodes in the sub graph and reduce the query time on the basis of ensuring the accuracy of user query.Second,sub graph cleaning technology.Aiming at the redundant knowledge and wrong knowledge in sub graph,a knowledge graph cleaning technology based on graph embedding technology is proposed.Firstly,using Trans R translation model,the knowledge of nature is expressed as vectors in low dimensional space.Next,the redundant knowledge processing based on semantic similarity comparison is carried out by the cleaning model,which transforms the problem of sub graph error detection into a multi-class classification problem,fills and corrects according to the results of graph embedding,so as to realize the cleaning of knowledge graph.The experimental results show that the redundant and wrong nodes can be found and removed effectively by cleaning technology.Third,sub graph updating technology.This paper puts forward and analyzes the problems that may occur in the process of the original knowledge graph and sub graph synchronization in the knowledge graph updating process.According to the situation that the sub graph needs to be updated,the update prediction model is proposed.Because the number of samples with updated labels is far less than that without updated labels in the entity nodes of the sub graph,this paper uses the generative method of semi-supervised learning to train the prediction model.The sub graph updating model proposed in this paper belongs to incremental update model.Compared with the whole sub graph update mode,it only pays attention to some changed entity nodes,which will save more computing resources and update time.

Keywords/Search Tags:

Knowledge Graph Cleaning, Extraction of Subgraph, Knowledge Graph Updating Prediction, Knowledge Graph Error Correction

PDF Full Text Request

Related items

1	Research On Knowledge Graph Link Prediction Based On Subgraph Reasoning
2	Research On Knowledge Graph Construction Technologies Based On Text Feature Learning
3	Research And Implementation Of Updating Knowledge Graph Of Vertical Domain Based On Prior Knowledge
4	Research And Implementation Of Data Cleaning Technology Based On Knowledge Graph
5	Human-in-the-loop Knowledge Graph Cleaning
6	The Study And Application Of Knowledge Graph Relation Optimization Technology Based On User Feedback Information
7	Research On Key Technologies Of Knowledge Graph Costruction For The Knowledge Field Of Ship
8	Research And Implementation Of Small Sample Knowledge Graph Completion Based On GA
9	Research On Knowledge Graph Embedding Algorithm Based On Modeling Relation Patern
10	Research On Knowledge Graph Construction Method For Sparse Labeling