Font Size: a A A

Algorithms Of Copy Detection And Truth Discovery For Multi-relational Data

Posted on:2013-07-26Degree:MasterType:Thesis
Country:ChinaCandidate:R JinFull Text:PDF
GTID:2298330467474649Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of Internet technology, a variety of information dissemination technology is widely used. The ensuing result is that the quality of information is difficult to be guaranteed. On one hand, some false information may be introduced in the process of information publishing. And on the other hand, some publisher may copy other publishers’ information with little modification. Therefore, access to copy relationship among publishers is particularly important. The goal of our paper is to find the copy relationship among publishers which publish the relational data, and use the result to improve the result of truth discovery.There exists a lot of copy detection and truth discovery algorithm for single-relational data. Copy relationship on Multi-relational data sources is also important, but we have not found any researches on this aspect. The main work of our research is to analyze the characteristic of multi-relational data and use it to expand the algorithm of single-relational data to adapt to the copy detection of multi-relational data. Then use it to improve the accuracy of truth discovery of multi-relational data..We analyze the characteristics of the multi-relational data and propose one basic assumption and three basic rules for copy detection of the multi-relational data. The basic assumption is that data on some attribute may not be independent of each other. The basic rules include the Decomposition rule for the data redundancy, the Tamper rule for the dependent of data of one attribute, and the Fake rule to consider whether the object is true or not. Further, we give the detail application of each rule andthe whole algorithm based on Bayes Rule for the multi-relational data. Truth discovery is based on the result of copy detection of multi-relational data sources. To assure the consistence of truth discovery, the truth discovery is processed one object after one object. And the influence of confidence of source, independence of source and authority of source is considered to improve the result of truth discovery. We give the detail description of calculation for the independence of source. Finally, all of our algorithms are tested on the ideal copy data set. Test results show that our algorithm can be used to find the copy relationship among sources and find the truth of the object provided by these sources. The recall rate and accuracy rate of the algorithm is also high.
Keywords/Search Tags:copy detection, truth discovery, bayses model, functional dependence, schemedecomposition
PDF Full Text Request
Related items