Research And Application Of Truth Discovery On Entity Attribute Correlation And Domain Awareness

Posted on:2022-12-07

Degree:Master

Type:Thesis

Country:China

Candidate:H Lv

Full Text:PDF

GTID:2518306779471824

Subject:Automation Technology

Abstract/Summary:

PDF Full Text Request

With the rise of the mobile Internet,society has entered the era of big data.Data is usually information describing an object,such as the height of Mount Everest,which we can collect from different sources.However,not all sources are equally credible,and there is inevitably noise in the data they provide.Therefore,the authenticity of big data needs to be analyzed urgently.Manually marking to resolve data conflicts requires a lot of time and manpower,which is obviously unrealistic for massive big data.Therefore,in order to automatically identify correct information from multi-source data,truth discovery has emerged as an important fundamental research topic.At present,there are two researches that need to be improved for the truth discovery technology for data integration:（1）The truth discovery problem based on entity attribute correlation,there are various correlations between entity attributes,and these correlations will affect the accuracy of the truth discovery result.（2）For the problem of truth discovery based on domain awareness,the reliability of sources varies in different domains.By dividing the reliability of sources in a fine-grained manner,the accuracy of truth discovery results can be further improved.This paper uses the relevant theories,techniques and methods of data mining to systematically study the above two issues.The main research contents are as follows:Firstly,aiming at the problem of truth discovery of entity attribute correlation,this paper proposes a truth discovery model GETD based on graph embedding relatiton perception.By constructing four kinds of heterogeneous networks,including source-source,source-entity attribute value,entity attribute-entity attribute and entity attribute-entity attribute value network,the relationship between data is modeled.Then these networks are embedded in a low-dimensional space,so that reliable sources and reliable attribute values are close to each other,and the relationship between entity attributes is reflected on the attribute values,so as to conduct ground truth discovery inference.Experimental results on two real-world datasets validate that the GETD algorithm outperforms existing truth discovery algorithms.Secondly,for the domain-aware truth discovery problem,this paper proposes a domain-aware truth discovery model DTD,which divides the reliability of sources into a fine-grained representation.In addition,in view of the problem that the performance of the existing truth discovery algorithms is limited by the uniform weight initialization of the source,this paper also proposes a fine-grained weight initialization method based on the richness of the domain information of the source.In this paper,the domain-aware truth discovery is regarded as an optimization problem,in which the reliability of the source and the credibility of the declared value are defined as two unknown variables,and the objective function is defined as the distance weighted between the declared value and the truth value.At the same time,in order to solve the optimization model,a two-step iterative update method is adopted,one step is to update the source weight,one step is to update the credibility of the declared value,and different loss functions are used to deal with different data types.Experimental results on two real-world datasets validate that the DTD algorithm outperforms state-of-the-art truth discovery methods.Finally,a prototype system for truth discovery is designed and developed.The system integrates the two algorithms proposed in this paper and other truth discovery algorithms,and mainly realizes the functions of datasets upload,truth discovery algorithms selection,and truth discovery result download.Users can upload datasets through the system,and select different truth discovery algorithms for data integration work,and finally download the datasets that complete the truth discovery step.

Keywords/Search Tags:

truth discovery, heterogeneous networks, entity attribute correlation, graph embedding, domain awareness

PDF Full Text Request

Related items

1	Research On Truth Discovery Algorithm Based On Open Source Information
2	Heterogeneous Entity Consistency Modeling And Truth Discovery Under Multi-source
3	Hidden Markov Model Based Multi-truth Discovery
4	A Study On Path-based Knowledge Graph Embedding
5	Domain-specific Expert Knowledge Graph Construction And Disambiguation
6	Clustering Users Based On High-dimensional Fine-grained Features In Social Networks
7	Research On The Method Of Entity Alignment With Attribute Enhancement Based On Graph Convolution
8	Research On Entity Alignment Method Based On Joint Robust Knowledge Graph And Attribute Fusion
9	Heterogeneous Fingerprints Fusion Indoor Positioning Based On Truth Discovery
10	Research On Heterogeneous Graph Embedding Method And Application Based On Graph Neural Network