Font Size: a A A

Research And Application Of Multi-Truth Finding Algorithms For Web

Posted on:2020-10-04Degree:MasterType:Thesis
Country:ChinaCandidate:L F ChenFull Text:PDF
GTID:2428330596995129Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of network technology and the widespread use of intelligent devices,data generation and creation are at an unprecedented speed.However,while big data changes many aspects of modern society,we can often observe that different data sources provide conflicting descriptions of the same entity.These conflicts are often caused by input errors,outdated data,loss of records and other reasons.If applied in practice,they may cause huge damage and economic losses.Given a large-scale data,it is unrealistic to determine the authenticity of data manually.Truth value discovery method can find the most realistic truth from multiple data sources to solve conflicts,so it has become a research hotspot.In recent years,researchers consider different scenarios,different influencing factors,different entity truth values and data source trustworthiness calculation methods to study truth discovery,and propose a variety of algorithms.However,the current truth discovery calculation usually assumes that an entity has only one truth value for an attribute,and the research on multi-truth discovery is relatively few.However,in reality,it is more common for entities to have multiple truth values.For multi-valued entities: Firstly,a multi-truth value discovery algorithm is proposed in this paper.The algorithm transforms the discovery of multiple truth values into a function optimization problem.Its goal is that the real value set of an entity should have the highest similarity with all the value sets provided by the data source to the entity.According to the choice of truth value by objective function,an iterative algorithm is designed to jointly push the reliability of data sources and the truth set of entities.At the same time,when calculating the confidence of the descriptive value,an asymmetric support degree calculation method is proposed,and its confidence is revised with the support of similar value.Secondly,this paper presents an improved model of the existing multi-truth discovery algorithm.Aiming at the problem that the existing multi-truth value discovery algorithms often neglect the estimation of the number of real values of entities,the model divides the existing multi-truth value discovery algorithms into two parts: truth calculation and truth prediction,and combines the detection of the number of true values into the existing multi-truth value discovery process.At the same time,when calculating the number of real values of entities,a symmetrical similarity calculation method is proposed,and the probability of the number of true values is modified with the support of similar values.It is a general model suitable for any multi-truth discovery method,as long as it can produce the evaluation results of data sources and descriptive values.Finally,through two sets of experiments on three real-world datasets,this paper evaluates the proposed multi-truth discovery algorithm and the improved model of the existing multi-truth discovery algorithm,verifies the effectiveness of the proposed multi-truth discovery algorithm and the improved model of the existing multi-truth discovery algorithm,as well as the accuracy of the combination of different factors.At the same time,the proposed algorithm and model are compared and recommended.
Keywords/Search Tags:big data, truth finding, multi-truth, truth number, data integration
PDF Full Text Request
Related items