Top-k Query Technology Of Massive Uncertain Data In Cloud Environments

Posted on:2014-03-31

Degree:Master

Type:Thesis

Country:China

Candidate:X Lu

Full Text:PDF

GTID:2268330422965632

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the development of information technology, the amount of data which could be obtained from the network explosively increased. What people faced with the challenge is not the lack of big enough information, but is how to find the valuable information which we need. To solve the problem, the Top-k query showed a great vitality. The Top-k query is a very important technology in the application of the large data interaction. According to the sorting with the user’s query conditions, the Top-k query result is the set of tuples ranked in the top k. At the same time, the data offen has a lot of noise, missing values, inconsistent factors, etc; the uncertainty is prevalent among the mass data. The Top-k query on uncertain data will be more complex than the traditional Top-k query on certain data, no matter from the query semantics or the query algorithm. The Top-k query on uncertain data gradually attracted the scholars’ attention.Since the concept of cloud computing introduced by Google, it has been strongly supported and developed by the academic and business communities. The design concept of cloud computing is allowing dynamic allocation of computing power, network resources, storage resources, on-demand services. Able to provide powerful computing and storage services, cloud computing can deal with the massive information at a relatively low cost, and thus get the favor of many IT companies.As cloud computing has a strong capabilities of processing the mass data, the Top-k query techniques will significantly improve the efficiency by using some technologies in cloud computing. The main work is as follows:1. To deal with the datasets which tuple is "tuple-level" uncertainty, we analyzed the Top-k query semantics base on the parameterized ranking functions and designed an algorithm to compute the upper bound of the tuple’s parameterized ranking function, which tuple has not been retrieved. In that way, we could avoid computing all the tuples’ value of ranking function in the dataset, and solve the problem of pruning in the Top-k query. As the experiments show, our algorithm is more effective to deal with the Top-k queries for the massive uncertain data on running time.2. In view of the uncertain dataset, we proposed a query semantic of Top-k frequent items and presented a query algorithm based on the generating function. At the same time, three pruning rules were proposed to filter out the items which can’t be the Top-k frequent items. 3. We built a cloud computing environment. In this cloud environment, we designed two algorithms based on the MapReduce programming model to achieve the distributed parallel computing of Top-k queries. As the experiments show, our algorithms are more effective to deal with the Top-k queries for the massive uncertain data on running time.

Keywords/Search Tags:

Topk, Uncertain Data, Cloud Environment, MapReduce

PDF Full Text Request

Related items

1	Research On Topk High Expected Weight-based Itemsets Mining With Uncertain Datasets
2	Research On MapReduce Secure Data Exchange Based On Trusted Execution Environment Technology
3	The Research Of Task Scheduling Algorithm For Mapreduce Framework In Cloud Environment
4	Research And Implementation On Fuzzy C-means Algorithm For Big Data In Cloud
5	Research On Top-k Queries Optimizing Algorithm On Uncertain Dataset
6	ER-Topk Query Processing On Ucertain Streams
7	Data Mining Algorithm Parallelization In Cloud Environment
8	Design And Implementation Of Medical Vaccine Big Data System Based On Cloud Computing Environment
9	Fault Tolerance For MapReduce In The Cloud Environment
10	Data Storage Security Technology And Application Based On Cloud Platform