Human-in-the-loop Knowledge Graph Cleaning

Posted on:2022-03-10

Degree:Master

Type:Thesis

Country:China

Candidate:L Li

Full Text:PDF

GTID:2518306563962739

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

In the era of big data,the amount of data is growing explosively.However,the complex knowledge in the Internet is difficult to be well utilized.In order to organize and manage knowledge better and improve the understanding ability of information in the Internet,the concept of knowledge graph has been formed,which has been utilized to various fields,such as information retrieval,Q & A recommendation and so on.With the maturity of automatic knowledge extraction research,more and more largescale knowledge graphs have been established.However,the quality of knowledge from various sources in the Internet is irregular,and the ability of automatic construction algorithm to distinguish similar entities and relations is limited.The noise is inevitably introduced into knowledge graph,so the cleaning work is indispensable.In order to break through the bottleneck of traditional automatic cleaning methods limited by samples,thanks to the maturity of crowdsourcing technology in recent years,semi-automatic cleaning methods began to emerge.This paper focuses on the integration of the advantages of machine and crowdsourcing,and carries out the knowledge graph cleaning research by human-machine combination.Moreover,two different kinds of effective cleaning ideas are proposed.The main achievements of this paper are as follows:We propose a knowledge clustering based human-in-the-loop model for error detection(KCHED).Considering that the error detection of triples is equivalent to classification problem,a human-in-the-loop cleaning framework is built,based on reliable samples supplied by crowdsourcing in order to improve the effect of the classifier.At the same time,the model mines the rich categories and semantic information in the knowledge graph,quantifies the correlation degree between the triples,and achieves the goal of triple clustering by designing an effective graph model,which provides the basis for the selection of crowdsourcing tasks.In order to achieve the purpose of improving the quality of error detection further,we propose a partial order and triple confidence based model for error detection(POTTED).Aiming at the problem that KCHED model is still restricted by samples for training high quality classifiers,so we choose to shift the focus of error detection of triples to crowdsourcing.Based on the confidence of triples,we can get an effective partial order and then build a graph model on it.Furthermore,we design a reasonable crowdsourcing task selection algorithm,so that we can infer the correctness of more triples according to the partial order relationship on the premise of manually verifying a few triples.KCHED and POTTED are compared with the mainstream methods in the research of knowledge graph cleaning on the public dataset.KCHED model achieves better detection accuracy than the existing methods by selecting more reliable samples from the knowledge graph,quantifying the correlation degree of the triples,and introducing crowdsourcing to participate in the verification.By taking advantage of machine and crowdsourcing together,evaluating the confidence of the triples from multiple perspectives,POTTED accelerates crowdsourcing error detection based on partial order,and acquires the further improvement in the quality of cleaning.

Keywords/Search Tags:

knowledge graph, data cleaning, crowdsourcing, knowledge graph embedding, clustering

PDF Full Text Request

Related items

1	Research On Semantic Based Knowledge Graph Cleaning And Optimization Technology
2	Big-Service-Oriented Knowledge Graph Management Platform
3	Data-augmented Knowledge Graph Embedding With Filtered Rules
4	Research And Implementation Of Data Cleaning Technology Based On Knowledge Graph
5	The Study And Application Of Knowledge Graph Relation Optimization Technology Based On User Feedback Information
6	Knowledge Graph Enrichment And Error Detection Techniques
7	Research And Application Of Knowledge Graph Embedding Based On Affine Transformation
8	Research On Knowledge Graph Embedding Technology And Its Application
9	Research On Large-scale Knowledge Graph Embedding Methods
10	A Research Of Relational Inference Algorithm Based On Knowledge Graph