Font Size: a A A

Sensitivity Analysis Of Query Results From Uncertain Databases

Posted on:2015-01-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:J W ChenFull Text:PDF
GTID:1228330452969331Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Due to the large amount of emerging requirement for efective storage and manage-ment of uncertain data from the application domains such as data extraction and integra-tion, data mining and machine learning, statistical analysis and sensor networking, thereare many research work focusing on uncertain data processing in recent years. The datastored in the uncertain database may be imprecise or even erroneous, and their perturba-tions may lead to great changes in the query result. This makes the sensitivity analysisof the query result with respect to the input uncertain data a key problem in uncertaindatabases.This dissertation performs a deep study on the sensitivity analysis problem of the ba-sic queries and the Top-K Ranking queries in uncertain databases. The main contributionsof this dissertation are as follows: For the basic queries, we propose a partial derivativebased method to perform the sensitivity analysis, and optimize the corresponding algo-rithm for the queries with safe plans from a quadratic time cost to a linear time cost. Forthe Top-K Ranking queries, we divide the sensitivity analysis problem into diferent cat-egories, and develop a modular approach to quantitatively compute sensitivity of answerordering, where five basic processing modules are identified. Based on the modules, wegive three algorithms (the optimal algorithm, the cumulative greedy algorithm and thenon cumulative greedy algorithm) for answer ordering sensitivity analysis. The cumula-tive greedy algorithm and the non cumulative greedy algorithm make a trade of betweenefciency and accuracy, with the former more accurate and the latter more efcient. Inaddition, we further develop a Tuple Insertion Based Algorithm for computing the Top-K Ranking query on dataset under the attribute-wise uncertain data model and the PRFbased ranking framework, which has a quadratic time complexity and is an improvementover the state of the art cubic algorithm. Based on the Tuple Insertion Based Algorithm,pruning strategies are developed to further reduce the time cost. Our experimental resultson both real data and synthetic data verified the efectiveness and efciency of all thealgorithms proposed in this dissertation.
Keywords/Search Tags:uncertain database, query, sensitivity analysis
PDF Full Text Request
Related items