Font Size: a A A

Top-k Query Technology Of Massive Uncertain Data In Cloud Environments

Posted on:2014-03-31Degree:MasterType:Thesis
Country:ChinaCandidate:X LuFull Text:PDF
GTID:2268330422965632Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of information technology, the amount of data which could be obtained from the network explosively increased. What people faced with the challenge is not the lack of big enough information, but is how to find the valuable information which we need. To solve the problem, the Top-k query showed a great vitality. The Top-k query is a very important technology in the application of the large data interaction. According to the sorting with the user’s query conditions, the Top-k query result is the set of tuples ranked in the top k. At the same time, the data offen has a lot of noise, missing values, inconsistent factors, etc; the uncertainty is prevalent among the mass data. The Top-k query on uncertain data will be more complex than the traditional Top-k query on certain data, no matter from the query semantics or the query algorithm. The Top-k query on uncertain data gradually attracted the scholars’ attention.Since the concept of cloud computing introduced by Google, it has been strongly supported and developed by the academic and business communities. The design concept of cloud computing is allowing dynamic allocation of computing power, network resources, storage resources, on-demand services. Able to provide powerful computing and storage services, cloud computing can deal with the massive information at a relatively low cost, and thus get the favor of many IT companies.As cloud computing has a strong capabilities of processing the mass data, the Top-k query techniques will significantly improve the efficiency by using some technologies in cloud computing. The main work is as follows:1. To deal with the datasets which tuple is "tuple-level" uncertainty, we analyzed the Top-k query semantics base on the parameterized ranking functions and designed an algorithm to compute the upper bound of the tuple’s parameterized ranking function, which tuple has not been retrieved. In that way, we could avoid computing all the tuples’ value of ranking function in the dataset, and solve the problem of pruning in the Top-k query. As the experiments show, our algorithm is more effective to deal with the Top-k queries for the massive uncertain data on running time.2. In view of the uncertain dataset, we proposed a query semantic of Top-k frequent items and presented a query algorithm based on the generating function. At the same time, three pruning rules were proposed to filter out the items which can’t be the Top-k frequent items. 3. We built a cloud computing environment. In this cloud environment, we designed two algorithms based on the MapReduce programming model to achieve the distributed parallel computing of Top-k queries. As the experiments show, our algorithms are more effective to deal with the Top-k queries for the massive uncertain data on running time.
Keywords/Search Tags:Topk, Uncertain Data, Cloud Environment, MapReduce
PDF Full Text Request
Related items