Cleaning And Top-k Querying Uncertain Data With Aggregate Contraints

Posted on:2014-04-08

Degree:Master

Type:Thesis

Country:China

Candidate:B Liu

Full Text:PDF

GTID:2308330479979349

Subject:Management Science and Engineering

Abstract/Summary:

PDF Full Text Request

With advances in technology and the expanding comprehension of people about data acquisition and management, uncertain data has been widely appreciated. In many real-world applications, such as economic, military, financial, telecommunications and other fields, widespread uncertainty in the data, uncertain data processing is particularly important. Uncertain data for research and treatment process, often encountered and the most difficult to resolve is possible worlds space explosion problem. Because of the particularity of uncertain data, it is more difficulty to process the uncertain database than the traditional database with the same size. Our paper studies the uncertain data cleaning and Top-k queries with aggregate constraints, we found that both uncertain data cleansing and queries, there is likely to determine whether a possible world gathered to meet the constraints during the study, when number of uncertain data tuples is large, we will face the possible worlds explosion problem.In this paper, we investigate uncertain data cleaning and Top-k queries on uncertain data with aggregate constraints, the main research work is reflected in the following two points:1. Analysis the major problems of cleaning uncertain data with aggregate constraints, and transformed the cleaning problem into a nonlinear optimization problem by establishing a sound mathematical model. We propose a new MVFSA algorithm by redesigning the random perturbation to the feature that the values of uncertain data tuples are discrete in our problem. Experimental results show that the proposed new algorithm outperforms the traditional simulated annealing algorithm in terms of solution quality, at the same time, the efficiency has greatly improved.2. For uncertain data Top-k query based on aggregate constraint, we proposed the use of sampling methods to solve the problem that it is difficult to judge all possible worlds whether satisfying the constraints or not. Firstly, find a few initial points which satisfying all the constraints, and then from the initial points of departure, we use multiple Markov chains MCMC methods for sampling, the collected samples may well reflect the whole possible worlds distribution, we get the final query results by Top-k query on the sample set. Experimental results show, Top-k queries using multi-chain MCMC sampling results obtained by the method is really very positive results, and efficiency of the algorithm is much better than using simple random sampling algorithm for query.

Keywords/Search Tags:

Uncertain data, Data clean, Top-k query, Aggregate constraints

PDF Full Text Request

Related items

1	Research On Probabilistic Aggregate Nearest Neighbor Query Method Over Uncertain Data
2	Aggregation Query Research Over Continuous Data Streams
3	Mining Frequent Itemsets from Uncertain Data: Extensions to Constrained Mining and Stream Mining
4	Research Of Olap Technology Over K-Anonymous Data
5	Study On Skyline Query Processing Techniques On Uncertain Data
6	Data Provenance Management And Similarity Query Over Uncertain Data
7	The Research Of Key Processing Techniques Of Uncertain Skyline Query
8	Research On Querying Missing Data
9	Research On Key Techniques For Top-k Query Processing Over Uncertain Data
10	Aggregate Query Processing And Optimization Techniques On Uncertain Data