Font Size: a A A

Human-in-the-loop Knowledge Graph Cleaning

Posted on:2022-03-10Degree:MasterType:Thesis
Country:ChinaCandidate:L LiFull Text:PDF
GTID:2518306563962739Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the era of big data,the amount of data is growing explosively.However,the complex knowledge in the Internet is difficult to be well utilized.In order to organize and manage knowledge better and improve the understanding ability of information in the Internet,the concept of knowledge graph has been formed,which has been utilized to various fields,such as information retrieval,Q & A recommendation and so on.With the maturity of automatic knowledge extraction research,more and more largescale knowledge graphs have been established.However,the quality of knowledge from various sources in the Internet is irregular,and the ability of automatic construction algorithm to distinguish similar entities and relations is limited.The noise is inevitably introduced into knowledge graph,so the cleaning work is indispensable.In order to break through the bottleneck of traditional automatic cleaning methods limited by samples,thanks to the maturity of crowdsourcing technology in recent years,semi-automatic cleaning methods began to emerge.This paper focuses on the integration of the advantages of machine and crowdsourcing,and carries out the knowledge graph cleaning research by human-machine combination.Moreover,two different kinds of effective cleaning ideas are proposed.The main achievements of this paper are as follows:We propose a knowledge clustering based human-in-the-loop model for error detection(KCHED).Considering that the error detection of triples is equivalent to classification problem,a human-in-the-loop cleaning framework is built,based on reliable samples supplied by crowdsourcing in order to improve the effect of the classifier.At the same time,the model mines the rich categories and semantic information in the knowledge graph,quantifies the correlation degree between the triples,and achieves the goal of triple clustering by designing an effective graph model,which provides the basis for the selection of crowdsourcing tasks.In order to achieve the purpose of improving the quality of error detection further,we propose a partial order and triple confidence based model for error detection(POTTED).Aiming at the problem that KCHED model is still restricted by samples for training high quality classifiers,so we choose to shift the focus of error detection of triples to crowdsourcing.Based on the confidence of triples,we can get an effective partial order and then build a graph model on it.Furthermore,we design a reasonable crowdsourcing task selection algorithm,so that we can infer the correctness of more triples according to the partial order relationship on the premise of manually verifying a few triples.KCHED and POTTED are compared with the mainstream methods in the research of knowledge graph cleaning on the public dataset.KCHED model achieves better detection accuracy than the existing methods by selecting more reliable samples from the knowledge graph,quantifying the correlation degree of the triples,and introducing crowdsourcing to participate in the verification.By taking advantage of machine and crowdsourcing together,evaluating the confidence of the triples from multiple perspectives,POTTED accelerates crowdsourcing error detection based on partial order,and acquires the further improvement in the quality of cleaning.
Keywords/Search Tags:knowledge graph, data cleaning, crowdsourcing, knowledge graph embedding, clustering
PDF Full Text Request
Related items