Font Size: a A A

Cleaning And Top-k Querying Uncertain Data With Aggregate Contraints

Posted on:2014-04-08Degree:MasterType:Thesis
Country:ChinaCandidate:B LiuFull Text:PDF
GTID:2308330479979349Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With advances in technology and the expanding comprehension of people about data acquisition and management, uncertain data has been widely appreciated. In many real-world applications, such as economic, military, financial, telecommunications and other fields, widespread uncertainty in the data, uncertain data processing is particularly important. Uncertain data for research and treatment process, often encountered and the most difficult to resolve is possible worlds space explosion problem. Because of the particularity of uncertain data, it is more difficulty to process the uncertain database than the traditional database with the same size. Our paper studies the uncertain data cleaning and Top-k queries with aggregate constraints, we found that both uncertain data cleansing and queries, there is likely to determine whether a possible world gathered to meet the constraints during the study, when number of uncertain data tuples is large, we will face the possible worlds explosion problem.In this paper, we investigate uncertain data cleaning and Top-k queries on uncertain data with aggregate constraints, the main research work is reflected in the following two points:1. Analysis the major problems of cleaning uncertain data with aggregate constraints, and transformed the cleaning problem into a nonlinear optimization problem by establishing a sound mathematical model. We propose a new MVFSA algorithm by redesigning the random perturbation to the feature that the values of uncertain data tuples are discrete in our problem. Experimental results show that the proposed new algorithm outperforms the traditional simulated annealing algorithm in terms of solution quality, at the same time, the efficiency has greatly improved.2. For uncertain data Top-k query based on aggregate constraint, we proposed the use of sampling methods to solve the problem that it is difficult to judge all possible worlds whether satisfying the constraints or not. Firstly, find a few initial points which satisfying all the constraints, and then from the initial points of departure, we use multiple Markov chains MCMC methods for sampling, the collected samples may well reflect the whole possible worlds distribution, we get the final query results by Top-k query on the sample set. Experimental results show, Top-k queries using multi-chain MCMC sampling results obtained by the method is really very positive results, and efficiency of the algorithm is much better than using simple random sampling algorithm for query.
Keywords/Search Tags:Uncertain data, Data clean, Top-k query, Aggregate constraints
PDF Full Text Request
Related items