Font Size: a A A

On The Uncertainty Of The Data Confidence Algorithm

Posted on:2011-12-13Degree:MasterType:Thesis
Country:ChinaCandidate:G Y LiuFull Text:PDF
GTID:2208330335997972Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The data management technologies for deterministic data have been developed rapidly in last decades. In traditional database, an item either is in the database or is not, a tuple either is in the query answer or is not. Data uncertainty was becomming more and more important as the rapid development in data garthering and processing in various fields. The ubiquity of uncertain data in modern-day applications (such as information extraction, data integration, sensor and RFID networks, and scientific experiments) has resulted in a growing need for techniques to deal with such data.In recent years, tens of data models have been developed, among which ULDBs(Uncertainty and Lineage Databases) developed by Standford University has gained wide acceptance. ULDBs was based on the possible world model that contains a huge number of possible world instances with the sum of the probabilities equal to 1. It was an extension of relational databases with simple yet expressive constructs for representing and manipulating both lineage and uncertainty. And the Trio system has been developed at Standford based on ULDBs. However, query evalutation in ULDBs, especially the calculation of confidence, is still challenging due to the huge number of possible world instances and the existency of probabilities. For example, in a probabilistic database with contains N independent tuples, the number of the possible world instances is at least 2 if "a tuple belongs to the database" is a probabilistic event for all tuples. Then query evaluation using all possible world instances is a#P problem. This thesis addresses the chanllengs of query evaluation in ULDBs, and proposes an argorithm to calculate the confidence of the result tuple in a more efficient way.Fistly, this paper summarizes the research background and key challenges of Uncertainty Database. Then, it introduces the ULDBs model by explaining how to express uncertain data and how to conbine uncertainty and lineage together. After that, it studys the query evaluation in ULDBs, namely about how to compute the result tuple and confidence. And an analysis is conducted about the advantages and disadvantages on Widom's argrorithm. At last, it introduces a new argorithm based on the Depth-First Left-Most Traversal, which has lower time complexity comparing with Widom's. 1) A new argorithm based on Depth-First Left-Most Traversal for identifying independent modules in lineage graph. This argorithm helps to reduce the time complexity from O(M*E) to O(M*H), among which M represents the number of nodes, E represents the number of edges, and H represents the average number of parents for each node in the lineage graph.2) An argorithm based on Minimal Cut Set for Caculating the confidence of a complex independent module. This argorithm helps to reduce the time complexity from O(2k) to O(S), among which k represents the number of base tuples in the module, and S represents the number of minimal cut sets(S≤2k).3) Extensive experiments to compare the efficiency of these argorithems and demonstrate the practicability of our argorithm.
Keywords/Search Tags:Uncertainty, Lineage, Confidence, Independent Module, Fault Tree, Minimal Cut Set
PDF Full Text Request
Related items